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1. Introduction. In most of the problems of statistical inference for which 
we possess solutions the distribution function is assumed to depend in a known 
way on certain parameters. The values of the parameters are unknown, and the 
problems are to make inferences about the unknown parameter values. We 
refer to this as the parametric case. Under it falls all the theory based on nor- 
mality assumptions. 

Only a very small fraction of the extensive literature of mathematical sta- 
tistics is devoted to the non-parametric case, and most of this is of the last 
decade. We may expect this branch to be rapidly explored however: The 
prospects of a theory freed from specific assumptions about the form of the 
population distribution should excite both the theoretician and the practitioner, 
since such a theory might combine elegance of structure with wide applicability. 
The process of development will no doubt inspire some mathematical attacks of 
considerable abstractness. There are already signs that more number-theoretic 
problems and measure-theoretic problems will enter our subject through this 
door, and perhaps even some topological ones. Some ability to think in terms of 


1 Parts of this paper were used in an invited address g:ven under the title “‘Statistical 
inference when the form of the distribution function is unknown’”’ before the meeting of the 
Institute of Mathematical Statistics on September 12, 1943 in New Brunswick, N. J. 
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functionals, function spaces, and metrization of function spaces will be useful in 
attempting general theories of “best” tests and estimates. Toward such ab- 
stract phases of the development the attitude of the practical statistician should 
be one of tolerance, for the new theory already promises to give him many new 
tools which are both simpler and of wider use. 

While the maturity of the non-parametric theory is still in the future, it is well 
to remark that its beginnings go relatively far back. Of our most famous tests, 
such as Pearson’s x’-test, Student’s test, and Fisher’s analysis of variance tests, 
the oldest concerns a non-parametric problem: In 1900 Karl Pearson proposed 
his x’-criterion to test the goodness of fit of a theoretical distribution to observa- 
tions, and in 1911 he extended his x’-method to the problem of two samples. 
The first of these problems may be regarded as non-parametric if the choice of 
the theoretical distribution is not based on calculations from the data, and the 
second is without doubt a non-parametric problem. R. A. Fisher treated an 
analysis of variance problem non-parametrically at least as early as 1925, for in 
the first edition of his Statistical Methods for Research Workers we find the sign 
test. General formulations of the problems of statistical inference, and criteria 
for “good” and “best” solutions’ have been advanced by R. A. Fisher, Neyman, 
E. S. Pearson, and Wald. These general theories were all strictly parametric 
until 1941 when Wald proposed one sufficiently broad to cover the non-parametric 
case. 

We now introduce some notation to which we shall adhere throughout this 
paper. Statistical inferences are based on measurements. The total number 
of measurements will always be denoted by n. We conceive of n random 
variables X; , X2, --- , X, on which the measurements are made. The domain 
of each X ; can always be taken to be a set of real numbers. If vector random 
variables occur, the X; will denote components. The cumulative distribution 
function (c.d.f.) of the random variables will be written F(x: , 22, ++ , Zn), —— 
this is the probability that all X; < xz; simultaneously. The c.d.f. F, is then 
always defined in a complete. n-dimensional Euclidean space W, called the 
sample space; W is the space of points E = (a, t2,---,2n). The sample 
point with the random coordinates X,, --- , X, will be denoted by E. 

In describing the validity of specific non-parametric tests and estimates in the 
sequel it will be convenient to refer to the following classification’ of univariate 
c.d.f’s F(x): Q% is the class of all F. is the class of all continuous F. Q,; is 
the class of all absolutely continuous F, that is, those F for which there exists a 
probability density function f(x), so that 


F(a) = [ste ae. 


consists of all F which may be written in the above form with f continuous. 


2 For a bibliography see [22]. 
3 The notation follows [31]. 
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Part I. NON-PARAMETRIC TESTS 


2. The randomization method of obtaining similar regions. In any problem 
of statistical inference it is assumed that the c.d.f. F, of the measurements is a 
member of a given class {2 of n-variate distribution functions; we write F, ¢Q. 
Q is called the class of admissible F,. If Q is a k-parameter family of functions 
the problem is called parametric, otherwise, non-parametric. <A statistical 
hypothesis H is a statement that F, ¢ w, where w is a given subclass of 2. A test 
of the hypothesis H consists of choosing a Borel region w in the sample space 
W and rejecting H if and only if the sample point E falls in w; w is called the 
critical region of the test. 

The choice of the critical region w is usually* made as follows: A positive con- 
stant a (ordinarily about .01 or .05) is chosen and called the significance level of 
test. If regions w exist for which Pr{E ¢« w | F,,}—the probability that the sam- 
ple point E fall in w, calculated from the c.d.f. F,—is equal to a for all F,, € w, 
then the choice of critical region is usually limited to this class. Such regions 
are very important in the theory of testing hypotheses, and it is convenient to 
have a name for them: Following the terminology of Neyman [22] in the para- 
metric case we shall call them similar to the sample space with respect to all F, 
in w, or more briefly, similar regions. A similar region is then a region w for 
which Pr{Ee«w|F,} is the same constant for all F,ew. The advantage of 
using similar regions as critical regions is that the risk of rejecting the hypothesis 
when it is true (type I error) is controlled: no matter what member of w the 
unknown F, happens to be, the probability of rejection of the (true) hypothesis 
is exactly a. We remark here that the problem of the existence and structure 
of similar regions in the parametric case has been treated only under very heavy 
restrictions and must be considered still mostly unsolved, whereas we shall see 
later that in the non-parametric case it promises to be relatively simple. 

When similar regions exist for a chosen @ there is usually a large family of 
them. Ideally the choice of the critical region w from the family of similar 
regions would be based on a complete knowledge of two functionals of F,, for 
F,,€Q — w, that is, for those F,, corresponding to the various admissible ways in 
which the hypothesis can be false: the first, the probability of rejection (of avoid- 
ing a type II error), namely Pr{E ew | F,,}, called the power function of w, and 
the second, the relative importance of rejection in the concrete situation in which 
the test is to be applied. In other words, one would like to choose the w with the 
power function “best” for the very specific problem at hand. However, little 
has been done along this line in the non-parametric case, and, as we shall note 
below, the choice of w from the family of similar regions is usually made by 
means of a statistic chosen on intuitive grounds. 

A general method of obtaining similar regions, which we shall call the ran- 
domization method, will now be described. The credit for originating this 
method and envisioning its wide applicability belongs to R. A. Fisher, who first 





‘ Another approach to the choice of critical region will be described in section 13. 
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used it in 1925 [3]. Consider the set S of permutations on the coordinates 
21, %2,°**, 2,, Which leave invariant all the c.d.f’s F, in w. Suppose the 
number of permutations in the set S is s; then s divides n!. Now define for any 
point E in W a corresponding set {E’} of s points obtained by making on the 
coordinates of E the permutations of the set S. The value of the c.d-f. F, is 
then the same at all s points EL’ generated by E, for all E e« W and all F, ew. 
The s points of {E’} will be distinct unless the point £ lies in a certain region 
W> ; Wo depends on the set S of permutations determined by the class w, and will 
always be contained in the union of all diagonal hyper-planes z; = x; (i ¥ J). 
A critical region w is constructed by the randomization method by choosing a 
positive integer g < s, and for every FE not in Wy , putting qg points of the corre- 
sponding set {E’} in w and the remaining s — q points outside w, by any rule 
whatever, just so w is a Borel set. We shall also say that a Borel set w is ob- 
tained by the randomization method if it has the structure just described except 
on a (Borel) subset wo of w having the property Pr{E«wo|F,} = 0 for all 
F,¢€w. It may be shown by the methods used elsewhere [31] by the writer 
that if w is a class of continuous c.d.f’s then the region w thus obtained is a 
similar region with a = qg/s; furthermore, that under mild restrictions (roughly, 
that the boundary of w be a sufficiently ‘“‘thin”’ set), at least for certain classes w, 
this is the only method of obtaining similar regions. 

One might call the set {£’} of points corresponding to E the subpopulation of 
points “equally likely” under the null hypothesis H, but we shall call {£’} simply 
the subpopulation corresponding to E. The decision as to which q of the s points 
of the subpopulation are to be put into the critical region w is usually made with 
the aid of a statistic 7 chosen on an intuitive basis. By a statistic T we mean of 
course a function of the sample only, not depending on the c.d-f. F,, thus 
T(E) = T(X.,---, Xn). For a suitably chosen g, the qg points of the sub- 
population {£’} giving T(E’) values in a certain range—usually the q largest or 
q smallest values—are put into w, and these g values are then called the “sig- 
nificant” values. 

Before proceeding further let us consider an example illustrating all the defini- 
tions we have introduced thus-far. Suppose that on the basis of a sample of m 
pairs (X;, Y;), 7 = 1, 2,---, m, from a bivariate population with unknown 
c.d.f G(z, y) we wish to test the independence of the random variables X, Y. 
To fit our general notation write Y; = Xizm. Assuming only that the sample 


is random, we have, with n = 2m, that the c.d.f. of the sample point E is of the 
form 


F(x fy Ln) = I] G(x; ’ Titm). 


Now suppose we know or are willing to assume further that the unknown c.df. 
G(x, y) of the population is in a certain class 2” of bivariate c.d.f.’s, where 
2) is the bivariate analogue of the class 2, of univariate c.d.f.’s defined in section 
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1; thus if we knew the unknown G(z, y) were continuous, we would have G ¢ Q{”. 
The class 2 of admissible F,, is then 


= {Fel Fa = [] G(x. ri+m); Gen, 
t=1 


where the notation {F, | F, of the form §} denotes the class of all F, of the 


form §. The hypothesis of independence may now be expressed as H: F, € a, 
where the subclass w of Q is 


7 


™m 2m 
w= {Fal Fa =|] F°@,) JI F°@); F® 9,,k = 1, 2}. 
=] j=m+1 


The set S of permutations which leave all F, ¢ w invariant is obtained by mak- 
ing all possible permutations of the first m coordinates 2; , --- , 2m among them- 
selves, and of the second m coordinates mii, °** , 2m among themselves. The 
total number s of permutations in S is thus (m!)’. Making these permutations 
on the coordinates of any point E in W, we get the set {E’} of (m!)’ points. The 
points of {£’} are distinct unless E lies in the region W» defined as the union 
of all hyperplanes x; = x; where i ¥ j and i, j are both in the set of integers 
1,2, --- , mor else both in the set m + 1,---, 2m. Pitman [28] applied the 
randomization method to this problem, using as the statistic T(E) the numer- 
ical value of the (sample) Pearsonian correlation coefficient, 


2m 


T(E) = Daun /(Sa j 


t=1 


the large values of 7 being the significant ones. We note that 7(£) takes on 
at most m! different values over the subpopulation. What we previously called 
a “suitably chosen” q would be in the present case a multiple of m!, and the 
choice of significance level a = g/s would then be limited to multiples of 1/m!. 

The method of randomization is seen to exploit whatever symmetry properties 
the F,, in w possess as aclass. A special case of the general method is the method 
of ranks. This gives regions of an especially simple form defined by certain 
inequalities on the coordinates. Probably the only case in which the method of 
ranks will ever be used is when the F, in w have the following special kind of 
symmetry: Suppose they are completely symmetrical in each of certain subsets 
of the coordinates, say ¢ sets of n;, ne, --+ , n¢ coordinates, respectively, where 
: n; = n. We may assume the coordinates numbered so that F, is com- 
pletely symmetrical in the set rp;41, Tpj42, °° Lan, (Di = ze n;;t = 2,3, 
+++ ,¢; pi = 0), for all F, €w. The set S of permutations is thus generated by 
making all n;! permutations on the n; coordinates rp,41, -** ,%pjin, @=1, °°: , 
t), so that the total number of permutations in S is s = m! ng! --- n;!. 

Corresponding to the 7-th set of coordinates in which F, is symmetrical, let 
us divide the sample space JW up into n; ! regions defined by 


Tpytt < Ippy2 26+ < Lpyany 
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and the n;! —1 other inequalities obtained by permuting the subscripts in the 
above. Denote these regions by wix (kK = 1,-+-,ni!). Let 


Wk koesske = Wik, N We,ke N wu N Werke » 


that is, We,,45,---.x, 18 the part of W common to the regions wiz, , We4n.,°°*, 
w:,x,- This process divides the sample space W up into s disjoint regions 
Wk; ,ko,---,k, » Which we shall now denote simply by w, (¢ = 1,---,8). The set 
{w,} of regions covers all of the sample space W except the region W > on which 
certain coordinates become equal. We shall say that the sample point E has 
the o-th ranking, R, , if E fallsinw,. We mav then speak of a random variable 
R = R(E), the “ranking”, taking on the s possible values R,, or the “tied” 
ranking Ry if Ee Wo. 

A critical region w is constructed by the method of ranks by taking w to be 
the union of g of the regions w,. Those rankings FR, corresponding to the q 
regions w, constituting the critical region w, will be called the significant rank- 
ings. Any statistic 7(£) used as the criterion to decide which are the significant 
rankings now becomes a function of the ranking R only, T(E) = U(R). We 
may regard the method of ranks as a simplification of the problem of testing 
statistical hypotheses, in which the infinite n-dimensional sample space W’ is 
replaced by a finite space of s + 1 points R,. If 2 is a class of continuous F, 
we may ignore the point Ro since then Pr{R = Ro} = 0. 

In the problem of independence, which we have used before to illustrate the 
definitions of this section, the method of ranks was applied by Hotelling and 
Pabst [9], who took as the statistic U(R) the numerical value of the Spearman 
coefficient of rank correlation, large values being significant. 

The method of randomization yields similar regions if w is a class of continuous 
functions. What will the method get us if we drop the continuity restriction? 
In this case we can no longer ignore the possibility that the sample point E fall 
in the exceptional region W, , for we do not have Pr{E « Wo} = 0. We owe to 
Pitman [27] the following idea: We continue to use the subpopulation {£’} and 
a chosen statistic 7(H) as above, but instead of separating the points of {E’} 
into two classes (significant points and non-significant points) by means of 7'(E) 
we now add a third class of “doubtful” points.’ If the s points of the set {E’} 
are not distinct they are to be counted according to their multiplicities under the 
process of applying the permutations of the set S to the coordinates of E. Sup- 
pose that the large values of T are significant. Number the s points of {Z’} so 
that T(E:) > T(E:) > --- > T(E.). If T(E,) > T(Exs:) we call Ey, ++, 
E, significant, and the rest non-significant. However if T(E.) = T(Eq41), we 
term all points E’ with T(E’) = T(E;) doubtful, points E’ for which T(E’) > 
T(E), significant, and points E’ with T(E’) < T(E,), non-significant. This 
process divides the sample space W up into three regions instead of the customary 


‘Instead of the terms significant, non-significant, doubtful, Pitman uses discordant, 
concordant, neutral. 
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two, namely, a rejection region wz , an acceptance region w, , and a doubtful 
region Wp. It is a special case of the following procedure: For every set {E’} 
define positive integers me = me ({E’}) and m4 = mg ({E’}) such that me < 
q, Ma < 8 — g, and put mz of the points E’ in wz , m, of the points E’ in w, , 
and the remaining s — m, — mez of the points E’ in wp, in any way so that we 
and wy, are Borel regions. When any E’ is assigned to we or w, it is to be counted 
according to its multiplicity as defined above, if {E’} contains less than s dis- 
tinct points. It may be shown that with a = q/s, Pr{Eewer|F,} < a and 
Pr{Eew.a|F,} < 1 — a@ for all F, €w, that is, whenever H is true. 

Before closing this section on the method of randomization, we mention a few 
difficulties which frequently arise when it is applied. Except for very small 
samples the calculation determining whether or not the observed value Ep of 
the sample point E belongs to the significant points of the subpopulation {Zo} 
generated by E,, is usually extremely tedious. In such cases the author of the 
test often gives an approximation to the discrete distribution of his statistic 
T(E) over the subpopulation {EZ’} by means of some familiar continuous dis- 
tribution for which tables are available, the laborious exact calcuiation by 
enumeration then being replaced by the computation of a few moments (that is, 
values of certain homogeneous polynomials in the observed coordinates) and the 
use of existing tables of percentage points of the continuous distribution.* 
Barring some papers where the method of ranks is used, the justification of these 
approximations is never satisfactory from a mathematical point of view, the 
argument being based on a study of the behavior of two, or at most four, mo- 
ments. The only exception to the last statement appears to be a very recent 
paper by Wald and Wolfowitz [42], which may point the way to genuine deriva- 
tions of asymptotic distributions for the non-rank case of the randomization 
method. We shall distinguish between derivations of asymptotic distributions 
and arguments based on two or four moments by saying that a distribution is 
“proved”’ in the former case and “fitted” in the latter. 
~ Another difficulty arises, most noticeably in the method of ranks, out of the 
possibility of equality of the observed coordinates. In the distribution theory 
this is usually avoided by assuming w to be a class of continuous c.d.f’s, so that 
Pr{E«W | F,} = 0 for all F, €w, but in practice, since the measurements are 
usually made to about three significant figures, ties do occur in the sample. 
While some scattered work has been done on this question there is need for a 
thorough general treatment. 

In some of the work that has been done on particular non-parametric tests 


6 In many cases the approximate test obtained by fitting a familiar distribution is found 
to coincide with widely used tests based on normality assumptions. In such cases if the 
fitting is asymptotically correct the following remarks are justified: (1) If the non-para- 
metric test is used in a case where we hesitate to assume normality but normality actually 
exists, the non-parametric test is asymptotically as efficient as the older test assuming nor- 
mality. (2) If normality is assumed when it does not exist, no error is incurred asymp- 
totically when the older test is used. 
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it is not very clear just what the null hypothesis H is. Two situations often 
occur: Suppose H:F,, € w is the hypothesis we actually wish to test at significance 
level a. Let w be the chosen critical region and w, the class of F,, for which 
Pr{Eew|F,} = a. The two situations are (7) w is a proper subset of w, , 
and (27) w, is a proper subset of w. Of these (7) seems less objectionable, for then 
the probability of a type I error (rejecting H when true) is strictly a, but the 
probability of accepting H is the same when certain alternatives (F,, €w. — w) 
are true as when H is true. In case (zz) the probability of a type I error is not 
a unless F,, is in the subclass w, of w; thus there might be a much higher prob- 
ability than a of rejecting H when it is true, if the true F,¢€w — w,. To illus- 
trate situation (i) consider K. Pearson’s x’-test for goodness of fit of a theoreti- 
cal distribution F(x) toasample E. Suppose E is from a univariate population 
whose true c.d.f.is F(x). If F has the property that for the intervals J ; defined 




















in section 3, / dF = / dFy, j = 1, 2,---, N, then the probability of re- 
qj Ij 


jection is the same as when the hypothesis is true. An example of (27) might 
occur if we wish to test whether the means of two univariate populations are the 
same. If we use one of the tests of section 4 in which the probability of rejection 
is calculated under the assumption that the distributions of the populations are 
the same, then we do not know that the probability of a type I error is a, for the 


samples might come from two populations with the same mean but different 
distributions. 


3. Goodness of fit. Randomness. The non-parametric case of testing good- 
ness of fit is the following: On the basis of a sample E from a population with 
c.d.f. F(z) known to be a member of some Q, , we wish to test whether F = Fo, 
where Fp is a given c.d.f. The class of admissible c.d.f.’s F, is 


Q= {Fal Fe = TI Fe; Fea}, 
t=] 
and the hypothesis H specifies that F, ¢€w, where 


“- {Fe IF, = I F(z}. 


K. Pearson’s x*-test [25] consists of choosing an integer N, dividing the z-axis 


up into a set {J ;} of disjoint intervals (j = 1, 2,--- , N), and using as statistic 
T(E) the Pearsonian chi square, 


xr = 2 [m; — &(m,)}/&(m,), 


where m; is the number of observed coordinates of E in J;, and &(m;) = 


n i dF,. Large values of x> are regarded as significant. Exact significance 
ij 
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levels for x> could be obtained by considering its distribution over the sub- 
population {£’} generated by the sample. This process would lead to the 
multinomial distribution of the m; mentioned in the usual derivations of the 
asymptotic distribution of x> (for n — © with N fixed). Pearson himself found 
this asymptotic distribution, namely the x’-distribution with N — 1 degrees of 
freedom. In studying the problem of a “best” choice of the set {J ;} of intervals, 
Mann and Wald [17] adopted a non-parametric treatment, with »v = 2 for the 
class 2, above. 

Another test not depending on a choice of intervals J; could be made by using 
confidence belts for F as described in section 9 and rejecting H at the a level of 
significance if the graph of Fo is not covered by the belt with confidence co- 
efficient 1 — a. 

The problem of randomness is usually non-parametric; in the univariate case 
the class w of this problem is identical with the class 2 of the preceding. The 
index v and the class 2 for the problem of randomness would depend on the 
specific situation in which it arises. With two exceptions [42, 52], all tests of 
randomness proposed thus far have been functions of runs in the sample. Two 
kinds of runs have been considered, runs up and down, and runs above and below 
the median [1, 4, 14, 19, 32, 44, 51]. We note that the set S of permutations 
determined by w is the set of all n! permutations on the n coordinates of E. 
Suppose now v = 2. The proof [31] that all similar regions w have the random- 
ization structure applies to this problem. On the other hand such a region w 
has the property Pr{E ew | F,} = a for any F, which is completely symmetrical 
in the coordinates. Difficulty (2) discussed at the end of section 2 now arises if 
Q contains such symmetrical alternatives. The definition of an appropriate 
class 2 — w of alternatives and the question of the power of tests against the 
alternatives make the problem of randomness a difficult one. Beyond these 
few remarks we refer the reader to an expository paper by Wolfowitz [51] de- 
voted to the problem in the previous issue of this journal, and to a paper by 
Wald and Wolfowitz [42] in the present issue. The latter paper is one of the 
exceptions, previously mentioned, not based on the method of ranks. 


4. The problem of two samples. Suppose Xi, ---,Xm, and Y1,---, Ym, are 
samples from univariate populations with c.d.f’s F(x) and G(x) respectively, 
where we assume F, G ¢Q, , and that we wish to test the hypothesis that F = G. 
Write Y; = Xiim,, 80 that with n = m; + m2 we have 


o = {F, | FP, = il Fa) Th, G(a;); F, Ge a, 


wo = (Fel Fe = [I Fie; Peal}. 


The set S of permutations determining the subpopulation {E’} consists of all 
n! permutations on the n coordinates of E. The writer has shown [31] that no 
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similar regions exist in this case if vy = 0, while if v = 2, 3, or 4 a similar region 
necessarily has the randomization structure. 

The first non-parametric attack on this problem was given [26] by K. Pearson. 
The z-axis is divided up into intervals I,,---, Jy as in section 3. Let my 
and m2 be the number of measurements from the first and second samples, re- 
spectively, falling in J;. so that >-%_, mj, = m,,k = 1,2. The statistic T(E) 
used is 


N 
xp = (mm) dX (my mj — m2mn)'/ (ma + m2), 
i= 


with large values significant. In view of the remarks at the end of the last 
paragraph it would be necessary to calculate the distribution of >. over the sub- 
population {#’} in order to get a similar region. Pearson found the asymptotic 
distribution of x>- under the null hypothesis to be the x’-distribution with 
N — 1 degrees of freedom. 

A solution based on the method of randomization was proposed by Pitman 
[27]; the special case of this solution for m, = mez was published a little earlier 
by R. A. Fisher [6]. Pitman employed the numerical value of the difference of 
the sample means as statistic, 


T(E) = | 2, xi/m — 2, xi/ma), 


j=m™) 





large values being significant. He fitted an incomplete Beta-distribution to the 
subpopulation distribution of his 7T(£), and noted thet this approximation 
gave a result identical with the usual t-test valid when the population distribu- 
tions F(x) and G(z) are assumed normal with equal variances. 

_ Turning now to tests based on the method of ranks, we mention here that one 
for the case m; = mz was given by R. A. Fisher as early as 1925, namely the 
“sign test’”’ or “binomial series test” [3]. We may (and Fisher did) regard this 
as a test of a less restrictive hypothesis, and shall describe it in section 6. Be- 
tween 1938 and 1940 several tests employing ranks were proposed for the problem 
of two samples. The earliest of these, by W. R. Thompson [36], was shown to 
be inconsistent (section 11) with respect to certain alternatives F,,¢€2 — w by 
Wald and Wolfowitz [40]. These authors used as statistic U(R) the total num- 
ber of runs in a sequence V of n elements constructed as follows: Rank the 
measurements of the combined sample in order of increasing magnitude. Ac- 
cording as the j-th measurement in this rank order is from the first or second 
sample, put the j-th element of the sequence V equal to lor 2. In this test small 
values of the statistic U(R) are regarded as significant. The test is now quite 
practicable (for »v = 2) for certain ranges of m, and m,. For m, and m, both 
< 20, tables by Swed and Eisenhart [34] give the 1% and 5% significant values 
of U(R). Wald and Wolfowitz proved that for n —~ © with k = m,/m, fixed, 
the distribution of U(R) is asymptotically normal with mean 2m,/(1 + k) and 
variance 4km,/(1 + k)*. Swed and Eisenhart computed that for m; = mz this 





L ji 


i) 


b- 
‘ic 
th 


an 
er 
of 


he 
on 


ne 
he 
118 


m 


by 
m- 
he 
.c- 
nd 
all 
ite 
th 
les 
-d, 
nd 
his 


STATISTICAL INFERENCE IN THE NON-PARAMETRIC CASE 315 


gives a very satisfactory approximation outside the range of their tables. How- 
ever, further computation needs to be done on the accuracy of this approximation 
for m,; * mz and one of them >20. 

Another test based on ranks was advanced by Dixon [2], using as statistic 
U(R) the random variable 


mitl 
C* = 2» [((mi + 1)* — nj/mJ’, 


where the integers n; are defined thus: Let Z: < Z. < --- < Zm, denote the 
measurements of the first sample arranged in rank order. Then 7;is the number 
of measurements in the second sample falling in the interval (Z;1 , Z;), where 
we define Zp = — ©, Zmii1 = +. Large values of C” are significant. Dixon 
tabulated the 1%, 5%, and 10% significant values of C” for m,, m. = 2,3,---, 
10; for larger m: , mz he fitted a x’-distribution. 

A paper by Smirnoff [33, 16] suggests the following as a statistic U(R): Let 
Gm, (x) and Gm, (x) be the “empirical distribution functions” of the first and 
second samples, that is, m,G,,,(z) is the number of measurements in the i-th 
sample <z (i = 1, 2) and take’ 


U(R) = (mz* + m2")* sup | Gm, (z) — Gz (2) | 


with large values significant. Smirnoff showed that if » = 2 the asymptotic 
distribution of his statistic U(R) is a certain c.d.f. (A), previously introduced 
by Kolmogoroff [15]. More specifically, let jn, .m.(A) = Pr{U(R) < A; v = 2}. 
Then if n — © with m,/mz fixed, we have ®n,m(4) — P(A). The definition 
of (A) and references to tables of @(A)-are given in section 9. If instead of 
assuming v = 2 we take v = 0, the risk of type I errors may be controlled to the 
extent that Pr{rejecting H} < a for all F, ¢€ w, by employing Smirnoff’s theorem 
stating Pr{U(R) < A; v = O} < Bn,.m.(A), where ®n,.m,(A) is defined above. 

A test for the problem of two samples obtained by Wolfowitz by a modifica- 
tion of the likelihood ratio procedure will be discussed in section 12. When 
m, = m, the non-parametric analysis of variance tests of the “‘randomized 
blocks” type described in section 6 might also be used to test the more restricted 
hypothesis considered in this section. 

The non-parametric problem of k samples has been attacked by Welch [46], 
who used the method of randomization with the analysis of variance ratio as 
statistic T(E), and by Wolfowitz [50] with his modified likelihood ratio method. 

In this as in all the other sections where several solutions of the same problem 
of statistical inference are described, the question as to the relative merits of 
the various solutions arises, and in every case the question is as yet mostly or 
entirely unanswered. The only easy conclusion about the tests of this section 
would seem to be that the tests of K. Pearson and Pitman are not consistent with 


7 We use the notations sup and inf respectively for least upper bound and greatest lower 
bound. 





316 HENRY SCHEFFS 


respect to certain subclasses of the admissible alternatives, according to the 
definition of section 11. 


5. Independence. The classes 2 and w defining the problem of independence 
have already been stated in section 2, in which we described Pitman’s test (28] 
based on the randomization method and the use of | r | as statistic 7(#), where 
r is the sample value of the Pearsonian correlation coefficient. Pitman fitted 
an incomplete Beta-distribution to the subpopulation distribution of r’ and found 
the resulting approximation for vy = 2 equivalent to the usual test employing the 
t-distribution and valid for the case of normality. 

In section 2 we also mentioned the test earlier proposed by Hotelling and 
Pabst [9], which is based on the method ranks and employs the statistic U(R) = 
|r’ |, where r’ is the Spearman rank correlation coefficient. They proved that 
for vy = 2 the distribution of 7’ is asymptotically normal if F,¢w. Pitman’s 
fitting of an incomplete Beta-distribution applies also to (r’)’, and Kendall, 
Kendall, and Smith [12] made numerical calculations indicating that this gives 
a better approximation than the normal distribution. Since 7’ is calculated 
from 2d’, the sum of the squared rank differences, the latter may equivalently 
be used as the statistic U(R), small and large values of =d* being now both 
significant. Kendall, Kendall, and Smith [12] found the exact distribution of 
2d’ for the number of pairs m < 8. This work was anticipated by Olds [23], 
who calculated the exact distribution of =d’ for m < 7, and by fitting certain 
distributions for m > 7, gave a very useful table of the 1%, 2%, 4%, 10% and 
20% significant values of =d’ for m < 30. It would be desirable to have these 
tables extended by inclusion of the 5% values. 

M. G. Kendall [10] proposed another measure of rank correlation whose sig- 
nificant values are easier to calculate than those of Sd’, but since the Olds’ tables 
for the latter are available, Kendall’s innovation does not seem to possess much 
practical advantage. Wolfowitz [50], using his modified likelihood ratio method, 
gave another test for independence and generalized it to the problem of inde- 
pendence of k random variables. 


6. Analysis of variance. We suppose that we have n = rc measurements 
arranged in a rectangular layout of r rows and c columns. The 7 rows might 
correspond to the blocks and the c columns to the varieties in an agricultural 
experiment. The null hypothesis H is that of ‘‘no difference” in, the column 
effects. The measurement in the ¢-th row and j-th column is supposed to be on 
a random variable’ X;; with c.d.f. F°” (2) = Pr{X;; < x}. Let us assume at 
first that all the X;; are independent. The joint c.d.f. of the random variables 
X1;, °°: , X,; of the j-th column is then 


F(x, cere Zr) = Pr { 21; < M1, °° * 5 Lr; < xr} os I] F(a). 


t=1 


8 The double subscript notation is more convenient here than that used in section 2; 
after the class w has been defined the reader will see that the numbers n; used in section 2 to 
describe the symmetry of the F,, € w are all equal to c, and the X;; of the present section 
coincides with the X,;4; of section 2. 
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The symbol F, for the joint c.d.f. of all n random variables now denotes F(z , 
+++, Lie 5 °** 5 Un, °**, Lre). Gis the class of all F,, of the form 








c 


F, = I] F° (ay, ee > Lei), 


7=1 
where F is defined by the preceding equation, and all F“” are in a given class 
Q,. The hypothesis H states that the column distributions are all the same, 

F? (x, , ee » Xr) = F® (a,, rove » Xr) 7] _ 2, 3, ae , Cc), 


without specifying F., w is thus the subclass of 2 comprising all F,, of the form 


c 


F, = I] F°(@;, o2* y Dri). 


j=1 


































The F,, in w may be written 


F, = II { | Fay). 
t=1 \j—1 

Regarding the factor in braces for fixed 7, we see that it is left unchanged by any 
permutation of the c coordinates 2, ---, tic. The set S of permutations is- 
thus determined, and the subpopulation {£Z’} consists of the (c!)" points obtained 
by permuting among themselves the first set of c coordinates, the second set of 
c coordinates, --- , the 7-th set of c coordinates of FE = (an, -+-, Liej °° 
Dri ° °° 9 Sea). 

The above argument leading to the subpopulation {£’} of (c!)’ points is based 
squarely on the assumed independence of the n random variables X;;. Suppose 
now that the X;; are not known to be independent, as may happen in agricul- 
tural experiments [24]. To make the discussion concrete suppose in the r X c 
layout we have been considering, the rows refer to blocks (of plots) and the 
columns to varieties, so that the random variable X;; is the yield of the j-th 
variety on the i-th block. We owe to R. A. Fisher the method of including 
early in the experiment a random process which leads to the same “equally 
likely” subpopulation of points {H’} obtained before in the case of independence. 
This physical process which he calls “randomization’’ then permits the construc- 
tion of critical regions by the “method of randomization” in the sense we have 
been using the term. 

To explain the experimental process of randomization we shall imagine another 
r X c layout and a random set of mappings of the two layouts onto each other. 
In each block there are c plots and we now assume these numbered from 1 to c, 
the numbering to be held fixed. The second layout refers to the plots; the rows 
again correspond to the blocks, but the columns now correspond to the number 
of the plot in the block, thus the 7, 7 cell represents the j-th plot in the 7-th block. 
Now consider all 1:1 correspondences or mappings between the two layouts so 
that the i-th row always maps onto the i-th row (i = 1,---,1r). There are 
s = (c!)’ such mappings M, (k = 1, ---, 8). Suppose under the mapping M; 
the 7, ¢ cell in the block-plot layout maps on the 7, 7, cell of the block-variety 
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layout, where jx = jx (2, t), and the 7, 7 cell of the latter corresponds to the 7, t, 
cell of the former, t, = t (7,7). The physical randomization process consists of 
choosing the mapping MM; so that all s mappings have the same probability 1/s 
of being chosen. In other words, the randomized block pattern is selected in 
such a way that all the s possible patterns have equal probabilities of being 
adopted in the experiment. Now let Y{? be the yield of the i, ¢ plot if the variety 
assigned to it by the k-th pattern is planted there, and let G@™ (yu, -++ , Yrc) = 
Pr{all Y$? < y;;} be the joint c.d.f. of the Y$?. In calculating the c.d_f. F,, of 
the X;; associated with the first layout we must take account of the random 
process by which it is mapped onto the second: 


F,(2u 5 eRe se Zee) = Pr{all Xi < rs;} 


3 
a Pr{all Xy = Yai} Pr} Ye) cai < 2x55} 


~ a e G4 (tina.n > °°* 5 Lrity(r.e)) 
© consists of all F,, of the above form with G™ in a given class, say 2S”. The 
hypothesis H of “no difference” ‘of varieties asserts that the yields of the plots 
do not depend on the varieties planted on them, that is, that all G are the same, 
G® = G, without specifying G®. w is the subclass of 2 whose members are 
of the form 


8 
-1 (1) 
F, =8 p> Ge" (Zien > °°" » Trety(rie)): 


It is now seen that any permutation in the set S previously considered merely 
rearranges the terms of the above sum, so that F, remains invariant, and we 
have the same subpopulation {E’} as before. 

It is to be understood henceforth that either the X;; are known to be inde- 
pendent or else an experimental randomization has been carried out as described 
above, so that in either case the above set {E’} of (c!)’ points is the “equally 
likely”? subpopulation. 

The first application in the literature of the randomization method is found in 
R. A. Fisher’s “‘sign test’’ or “binomial series test’’ [3] for the case of randomized 
blocks with two columns (c = 2). Let D; be the difference Xi; — Xi. The 
statistic used is a function of the ranking only, namely the number of D; > 0, 
small and large values being significant. For v = 2 its distribution under the 
null hypothesis is the binomial distribution with the n and p of the usual notation 
equal respectively torand 4. This test may be regarded as the special case when 
c = 2 of Friedman’s rank method for analysis of variance to be described below. 

Fisher later [5] proposed another test for the case c = 2 not based on ranks, 
and employing as statistic T(#) the absolute value of the mean of the D; defined 
above, with large values significant. The exact distribution of this statistic is 
very laborious to calculate unless r is very small, and K. R. Nair [20] pointed 
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out that the use of the numerical value of the median of the D; (or one of the two 
central values when r is even) had the advantage of a very easily calculated dis- 
tribution (if y = 2). The latter may be regarded as a modification of the rank 
method, the method of ranks being applied not in the 2r-dimensionsl sample 
space as described in section 2 but in the r-dimensional space of the differences 
D;. Nair also showed that the distributions of the range and of the midpoint 
of the range of the D; are very simple. 
From here on we consider the general case c > 2, but when we speak of dis- 
tributions they will be understood to be for the case when the null hypothesis is 
true and v 2. Welch [45] considered using as 7T(E) the usual analysis of 
variance ratio appropriate to testing for “no difference’ of column effects. He 
transformed this to another statistic and calculated two moments of its subpopu- i 
lation distribution. The first moment always agrees with that obtained under 
“normal theory”’, that is for the case X;; = C; + Z;;, where the C; are constants 
and the Z;; are independently normally distributed with the same variance and 
zero means, but the second moment depends on the subpopulation {EZ’}. Here 
the exact distribution of the statistic is of course in general much more tedious to 
calculate than in the previous case c = 2; an incomplete Beta-distribution was . 
fitted by Welch. Welch anticipated Pitman [29] who obtained the same results i 
and got besides the third and fourth moments of Welch’s statistic. ' 
The method of ranks was applied by Friedman [7] who employed as statistic 
U(R) a quantity formed as follows: Rank each set of row entries X;; (for fixed 7) 
in ascending order of magnitude, and let r;; be the rank of X;;, so that ru, ---, 
ric iS & rearrangement of the integers 1, ---,c. Let 7; be the mean rank of the i 


j-th column, 7; = >>{217;;/r, and take for U(R) 
U = Qo » [7; — &(A)P, 
= 


where C,, is a certain constant, and &(7;) is calculated under the null hypothesis. 
For Friedman’s choice of C,-, U may be rapidly computed from the equivalent 
formula 


U 


e r 2 
—3r(c + 1) + 12 a (= rs) / rete + 1)). 
j=l \i= 
In his paper Friedman included a proof of Wilks’ that U has asymptotically the 
x'-distribution with c — 1 degrees of freedom as r—> ©. Kendall and Smith 
[13] fitted to a transform of U a Fisher z-distribution with continuity corrections, 
obtaining a better approximation for small r than the x’-distribution. Wallis 
[43] independently proposed the use of 77 = U/[r(c — 1)] as statistic and called 
it the rank correlation ratio. Friedman in a later paper [8] on the subject, using 
exact values he had calculated, together with the Kendall-Smith approximation, 
published tables’ of the 1% and 5% significant values of U for c = 3, 4, 5, 6, 


* In these tables our U, r, c are denoted respectively by x? , m, n. 
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7, and sufficiently many values of r so that for these c and any r the significant 
values of U are now easily available. 

After the above lengthy discussion for the “‘randomized blocks” case of analysis 
of variance, it will perhaps suffice merely to mention that the “Latin square” 
case may be similarly attacked from the non-parametric point of view, and this 
has been considered by Welch [45], Pitman [29], and E. S. Pearson [24]. They 
have taken as the statistic the usual analysis of variance ratio, and the work of 
Welch and Pitman in calculating the first two moments of its subpopulation 
distribution is even more tedious than in the “randomized blocks”’ case. 


Part II, NON-PARAMETRIC ESTIMATION 


7. Classical results on point estimation. Throughout part II the symbol 
E will always denote a random sample Xi, --- , X, from a univariate population 
with c.d.f. F(z), where F is an unknown member of a given class to be stated 
in each case. The c.d.f. of E is thus 


Fal, +++) = TE FC). 


The problems of estimation can be stated without reference to the class Q of 
admissible F,,; 2 would be obvious in every case. 

Let @ = 0(F) be a real number determined by F (a functional of F) for F ina 
certain class of univariate c.d.f’s. Thus @ might be the mean of the distribution, 
in which case @ would be defined for all F possessing a first moment. We shall 
not call @ a parameter in order to avoid confusion with the parametric case. 
R. A. Fisher’s criteria of unbiasedness and of consistency for point estimation 
carry over without change from the parametric case. A statistic T(E) is said 
to be an unbiased estimate of @ if &6(T) = 6: Write E = E, and T = T, to 
emphasize the sample size n, and assume that the statistic 7,,(E,) is defined for 
all n (or all m > some mm). Then we define 7,,(E,) to be a consistent estimate 
of 6 if it converges stochastically to @, that is, if Pr{| T, — @| >h} ~Oasn—-> 
©, for every h > 0. 

In the present paragraph it will be convenient to symbolize the class of F for 
which the 7-th (absolute) moment exists; we denote it by Qa(t = 1, 2, ---). 
It is known” that a sufficient condition for the stochastic convergence cf the 
sample mean Z to the population mean is that F eQq). Hence for all F eQy, 
& is a consistent estimate of the population mean; furthermore it is unbiased. If 
we apply this result to the random variable Y = X’, we find that for all F «Qa, 

7.1 27/nis a consistent unbiased estimate of the second moment of F about the 
origin. Similar statements may be made for higher moments. For F ¢eQq 
one may show further that with Q defined as >.2., (x; — 2)’, the statistics Q/n 
and Q/(n — 1) are consistent estimates of the population variance, and the 
latter is unbiased. 


10 See, for example, J. L. Doob, Annals of Math. Stat., Vol. 6 (1935), p. 163. 
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If there exists a number M such that F(M) = 3, it is called the median of the 
distribution. - The median Z of a sample of odd size is the central X; when the 
X; are arranged in order of magnitude; for a sample of even size we may take 
% to be the average of the two central values. It may be shown” that Z is a con- 
sistent estimate of M for F in the subclass of Q; for which the probability density 
function f(z) is continuous at s = M and f(M) = 0. 


8. Confidence intervals for an unknown median, for the difference of medians. 
Arrange the sample in rank order and denote the result by Z; < Z2 < --- < 
Z,, Where Z,,-°-:, Z, is a rearrangement of X,,---, X,. The joint dis- 
tribution of the Z; (or any subset of the Z;) is well known [49] if F(z) is restricted 
to 2,, which we now assume. From this distribution theory it is easy to show 
that for any positive integer k < 3n, the probability that the random interval 
(Zi, Zn—k41) cover the unknown population median M is 


PriZy <M < Zea} = 1 — 20j(n —k +1,h), 


where 


I-(p, q) = I ; (1 — t)* dt / I t? "(1 — t)*" dt 


is the incomplete Beta-distribution tabulated by K. Pearson. The practicability 
of estimating M by means of the above relation in the non-parametric case was 
noted first by W. R. Thompson [35]. It is not difficult to calculate tables giving, 
for various sample sizes n, the maximum k for which Pr{Z, < M < Zn} > 
.95 or .99. This has been done for n = 6 to 81 by K. R. Nair [21], who listed 
the maximum k as well as n — k + 1 and J;(n — k + 1, k), so that the exact 
confidence coefficient is available. Nair also gave asymptotic formulas which 
are very accurate for n > 81. 

It is clear how confidence intervals for the difference d = M:; — M;, of the 
medians of two univariate populations with c.d.f’s known only to be in , might 
be obtained by combining two probability statements of the above kind: Let 
the desired confidence coefficient be 1 — a, and form confidence intervals of the 
above type for M, and M; with confidence coefficient 1 — 4a; write them 
Pr{M; < M; < M;} >1—43a. Then Pr{M;— Mi<d< M.— Mj} >1-«a. 
Solutions like this which are easily obtained by the combining method in many 
problems are in general not very efficient. 

Some work of Pitman’s [27] may be regarded as a solution of the problem of 
estimating the difference of medians (or other quantiles, or means) of two 
populations in a case essentially more restricted than the preceding, but more 
general than the corresponding parametric case in which the distributions are 
assumed to differ only in location. To describe the nature of Pitman’s result, 


"1 This follows from the asymptotic distribution of #. See, for instance, [49], and com- 
bine section 4.53 with Theorem (A), p. 134. 
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let us revert to the notation introduced at the beginning of section 4, but add to 
the assumption that F and G are in a known class Q, the restrictive assumption 
that F and G differ only in location, that is, that G(x) = F(a —d). The problem 
is the interval estimation of the unknown constant d. Define the random vari- 
ables Z; = Y; —d. After noting that the mi + m2 random variables Xi, --- , 
Xm,,Z1,°**, Zm, are all independently distributed with the same c.df. F, 
Pitman was able to apply his results for the problem of two samples to show how 
functions d and d of Xi, --- , Xm,, Y1, -** , Ym, could be calculated such that 
Pr{d <d <d} >1 — a@for »v = 0, while for » = 2 the equality holds. After 
fitting an incomplete Beta-distribution Pitman found that the resulting approxi- 
mate confidence intervals coincide with the well known ones employing the 
t-distribution and based on the assumption that F and G are normal with the 
same unknown variance. 


9. Confidence limits for an unknown distribution function. Consider in 
an x, y-plane the graph g of the unknown c.d.f., g being the locus of the equation 
y = F(z), and the possibility of covering g with random regions #R(E) depending 
on the sample E. Wald and Wolfowitz [39] have shown how for given n and a 
it is possible in a large variety of ways to define regions R(E) such that Pr{R(E) 
> g}, the probability that the randem region R(E) cover the unknown graph g, 
is 1 — a@ for all FeQ,. Instead of describing their general method we shall 
limit ourselves to a special case. This is a very neat solution the necessary 
distribution theory for which was developed earlier by Kolmogoroff [15]. 

Let G(x) be the “empirical distribution function” of the sample: nG,(z) is 
the number of X; < x. Define the random variable 


Dn = Vnsup | F(x) — G,(z) |, 


and let #,(A) be the c.d.f. of D,, ®,(A4) = Pr{D, < 4}. Kolmogoroff proved 
that #,(A) is independent of F e%, and that as n — ~, @,(A) — (A) uni- 
formly in \, where (A) is defined by the rapidly converging Dirichlet series 


+00 


&(\) = p> (—1)* exp (—2k??). 


A small table of values of the function (A) was given by Kolmogoroff [15], and 
a larger one by Smirnoff [33]. Define \,,. from ®,(An,2) = 1 — a, and A, from 
@(\.) = 1 — a. Values of X. for a = .05, .02, .01, .005, .002, .001 were listed 
by Kolmogoroff [16]. Now 1 — a is the probability that 


/n sup | F(z) — Ga(z) | < Ane 
if F eQ,. The above inequality is equivalent to 
Ga(z) — Ana/Vn < F(x) S Galz) + Anve/V0 (all z). 


If we take as R(E) the intersection of the region between the graphs of the func- 
tions G(r) + An.a/V/n, With the strip 0 < y < 1, we have Pr{R(E) Dg} = 
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1 —a. The values of \,,. have not been tabulated, but for practical purposes 
of determining an unknown c.d.f. one would usually require a large n, and the 
tabulated values of \. could then be used. 

With ©, (A) defined as the c.d.f. of D, for F ¢€ Q, , Kolmogoroff has shown further 
that for F «%, Pr{D, < A} > ®,(A). This gives the beautiful result that the 
above confidence belt is valid in the most general case where F ¢ % , in the sense 
that for the above defined R(E), Pr{R(E) Dg} > 1 — a. 


10. Tolerance limits. An ingenious formulation and solution of a non-para- 
metric estimation problem was given by Wilks [47]. Let us say that an interval 
(x’, x’’) covers a proportion 7 of a population with c.d.f. F(x) if F(z”) — 
F(x’) = x. In the notation of section 8, Wilks considered the proportion B cov- 
ered by the interval (Z; , Zn_m41) extending from the k-th smallest observation 
to the m-th largest, B = F(Z,~m4i) — F(Z). Bisarandom variable depending 
on the sample but is not a statistic since it depends also on the unknown c.d_f. 
F(z). However, Wilks noted that the c.d.f. G(b) of B is independent of F «%&, 
in fact, for 0 < b <1, 


Gb) = h(n —-k —-m+1,k +m), 


where I,(p, q) is defined in section 8. After k, m, a fixed proportion b, and a 
confidence coefficient 1 — a have been chosen, the equation G(b) = a determines 
the sample size n for which we can then make the following assertion without 
any knowledge of F except that F ¢ 24: The probability is 1 — a that in a sample 
size n the random interval (Z; , Zn—-m+1) Will cover at least 100 b% of the popu- 
lation. 

Wilks considered, among other extensions of his method, tolerance limits for 
multivariate distributions in which the variables are known to be independent, 
and the estimation of proportions in a second sample (instead of in the popula- 
tion) on the basis of a first sample [48]. The latter problem involves the calcu- 
lation of P(b; n, N, k, m), the probability that if a first sample of n is taken and 
then a second sample of N, a proportion b or more of the second sample will lie 
in the interval (Z:; , Zn-m+1) determined from the first sample. Wilks’ deriva- 
tion of P requires the assumption that F eQ,, but a simple auxiliary argument 
(related to the method of randomization by ranks) will extend the validity to 
the case F ¢ 2,: The complete set of 2 + N variates is independently distributed, 
each with the same c.d.f. Fe. Ali (n + N)! possible rankings (excluding the 
“tied”? ranking Ro) as defined in section 2 then have the same probability 
1/(n + N)!. The fraction of these rankings for which the statement about pro- 
portions in the second sample is correct is a function of b, n, N, k, m only, and 
not of F eQ,, and this fraction is the desired P. Since P is the same for all 
F €Q it must of course coincide with the value calculated by Wilks for F «%. 
It would be desirable for practical purposes to extend the validity of the tolerance 


12 For fixed b, G(b) of course takes on discrete values with n, so one would either choose 
the n giving G(b) the nearest value to a or else the greatest value < a. 
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limits of the first paragraph, concerning proportions in the population, at least 
to the case FeQ;. The extension to 2. would follow immediately if the in- 
tuitively reasonable statement 1 — G(b) = limy... P(b; n, N, k, m) could be 
justified for F eQ. 

The multivariate case when independence is not assumed was successfully 
attacked by Wald [38]. We shall describe here his solution for the bivariate 
case: Let (X;, Y;:),7 = 1, --- , n, be a sample from a population with bivariate 
c.d.f. F(z, y) «2, that is, F is of the form 


Faw) = [f° sn) ands, 


where f(z, y) is continuous, but otherwise unknown. Plot the points (X;, Y;) 
in an 2, y-plane and choose four (small) integers k; , 7 , ke , m2. . Draw vertical 
lines (parallel to the y-axis) passing through the points with the k-th smallest 
and m-th largest abscissas. Considering only the n — k, —m, points inside 
these vertical lines (the probability of equal abscissas is zero), draw two hori- 
zontal lines passing through the points with k-th smallest and: m-th largest 
ordinates. Let J be the rectangle bounded by the four lines and consider the 


proportion B of the population covered by the rectangle, B = | dF (x,y). Then 
J 


the c.d.f. G(b) of B is given by the previous formula in terms of the incomplete 
Beta-distribution with k + m = k, + ke + m, + m, and is thus independent 
of f(z, y). Choose ki, ke, mi, m2, b, anda. Then the equation G(b) = a de- 
termines the sample size n for which the probability is 1 — a that the random 
rectangle J will cover at least 100 6% of the population. Wald showed further 
how a series of rectangles instead of a single rectangle might advantageously be 
used in the case of highly correlated X, Y. 

It would be most useful to have tables of n corresponding to a = .05 and .01, 
some values of 6 close to unity, and a few small values of k + m, say, k + m = 
2,4,--+,2r. The table could then be used for the univariate, bivariate, --- , 
r-variate cases with various choices of k;, m;, such that 2(k; + m;) = k + m. 
Entries for k + m = 4 have been given by Wald [38, p. 55]. 


Part III. Towarp a GENERAL THEORY 


11. The criterion of consistency. All the concepts of Part III have been 
carried over from, or suggested by, corresponding ones earlier developed for the 
parametric theory. Consistency of point estimation was defined in section 7. 
Wald and Wolfowitz [40] have generalized the notion of consistency to tests so 
that it is applicable in the non-parametric case. We have heretofore specified 
the hypothesis H and its admissible alternatives by means of classes of n-variate 
c.d.f’s F,. We now assume that H and its admissible alternatives can be 
framed as statements about one or more populations, independent of n. Thus 
in the problem of two samples (section 4) H may be taken as the statement that 
the c.d.f’s F and G of the two populations are the same member of Q, , while the 
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admissible alternatives are statements that F and G are any two different mem- 
bers of Q,. Returning to the general case, we assume that a sequence of tests is 
under consideration, say, T, , ZT, --- , such that asj7 — ~, the size of the sample 
in &; from each of the populations becomes infinite. The sequence {&;} may 
be called simply a “‘test’’ and is said to be consistent if the probability of rejec- 
tion of H by [; approaches unity as 7 — © whenever an admissible alternative 
to H is true. It has been suggested [50] that consistency is a minimal require- 
ment for a good test. In order to allow for the analogue of the “common best 
critical regions” in the parametric theory,” it would be better to define consist- 
ency with respect to any given subset of the admissible alternatives and then 
require consistency with respect to the subset appropriate to the specific situa- 
tion in which the test is to be used. 

Wald and Wolfowitz [40] proved that under certain restrictions on the ad- 
missible F, G in the problem of two samples their test based on runs (section 4) 
is consistent, while another previously proposed test is not. Judging from 
their work, we may expect that, while inconsistency proofs may be easy, con- 
sistency proofs will be difficult. 


12. Likelihood ratio tests. A definition of the Neyman-Pearson likelihood 
ratio criterion“ \ for testing the hypothesis H (we use the notation of section 2), 
which would yield the usual result in the parametric case, would be the follow- 
ing: Let C(E;6) be a cube of edge 26 in the sample space W with center at the 
point E and faces parallel to the coordinate hyperplanes, and jet P(E;6 | F,,) be 
the “probability put into the cube by the c.df. F,’’, that is, P(Z;6| F,) = 


| dF,. Define 
C(B;8) 


ME; 6) = [ sup P(E; 5| F.)]/[ sup P(E; 6| F,)], 


A = A(£Z) = lim A(E; 4). 


This definition of \ is not useful in the non-parametric case as \ turns out in 
general to be independent of E; the reader may easily verify this for the problem 
of two samples (section 4). 

Having seen now that the likelihood ratio does not carry over to the non-para- 
metric case in an obvious way, we are in a position to appreciate a bold stroke 
by Wolfowitz [50]. He begins by limiting the critical regions to be considered 
to the relatively small class obtainable by the method of ranks (section 2). Let 
R = R(E) be the ranking of the sample point E, so that the random variable R 
takes on the possible values Ry, Ri, ---,R.,andlet P(R.|F,) = Pr{R=R,|F,}. 


13 J. Neyman and E. S. Pearson, ‘‘On the problem of the most efficient tests of statistical 
hypotheses’”’, Phil. Trans. Roy. Soc. London, A, Vol. 231 (1933), pp. 289-337. 
4 J. Neyman and E. §. Pearson, Biometrika, Vol. 20A (1928), p. 264. 
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Then Wolfowitz takes the likelihood ratio to be the following function of the 
ranking R: 


A(R) = [ sup P(R | F,)|/{ sup P(R | F,)). 


His modified likelihood ratio test then consists of applying the method of ranks 
(section 2) with A(R) as the statistic, small values being regarded as significant. 
If 2 is a class of continuous F,, all rankings R ~ Ry have the same probability 
1/s under the null hypothesis, while P(Ro| F,) = 0 for all F,¢€2. Then the 
numerator of A(R) is 1/s, and we may thus use the denominator of A(R) as 
statistic with large values significant. Wolfowitz’ modification has one ad- 
vantage we don’t always find with the usual parametric method: it always leads 
to similar regions since it is a special case of the randomization method. 

In applying his method to examples Wolfowitz finds it necessary to resort each 
time to an approximation in calculating his statistic A(R). Instead of taking 
the “sup” over © as in the definition, he takes it instead over a subclass 2’ of 2 
which lends itself more easily to calculation. Thus in the problem of two samples 
with v = 2, whereas © is the class defined in section 4 with F, G in 2 , the class 
2 is the subclass of 2 obtained by further limiting F, G as follows: The z-axis is 
divided up into a number of disjoint intervals, equal to the total number of 
runs in the sequence V defined in connection with the Wald-Wolfowitz test in 
section 4. If the j-th run in V is a run of 1’s the restriction G(x) = 0 in the 
j-th interval is imposed, if the j-th run is a run of 2’s, F(x) = 0 in the j-th inter- 
val. The intervals in which F, G are permitted to assign positive probability 
then correspond in order and number to the two kinds of runs. With this re- 
striction the (twice) modified likelihood ratio statistic is found to be 


X x (li log li; — log Uj !), 


where |; ; is the number of elements in the j-th run of 7’s (¢ = 1,2). Large values 
are significant. For large samples the asymptotic distribution of the statistic 
falls out as a special case of a general theorem of Wolfowitz. 

In the same paper Wolfowitz obtained modified likelihood ratio tests for the 
problem of k samples and the problem of independence of two or more random 
variables. 

In his examples the author states that the maximizing F, in 2’ is “essentially 
the same” as the maximizing F, in Q, at least for the significant rankings R, 
and for large samples. The necessity of this approximation procedure is some- 
what disturbing, as is the restriction to the method of ranks. Since it does 
not seem possible to give a definition of likelihood ratio tests sufficiently broad 
to include the non-parametric case, yet yielding the usual result in the parametric 
case, we are denied even the small comfort of saying that at least in special cases 
the method is known to yield optimum results. In some problems the set 
{R.} of rankings, corresponding to the set {w,} of regions in W which serves to 
separate the s points of the subpopulations {H’} defined in section 2, is not 
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unique—consider for instance the problem of two samples when the populations 
are bivariate—and in such cases the method as defined above would not give a 
unique result. These remarks are intended to point the need for further in- 
vestigation and cannot detract from the ingenuity of the method—the first 
general process that has been suggested for choosing one out of the welter of 
similar regions yielded by the randomization method. 


13. Wald’s formulation of the general problem of statistical inference. A 
formulation of the general problem of statistical inference broad enough to cover 
the non-parametric case, and including estimation and tests as well as statistical 
problems classifiable under neither of these headings, has been given by Wald 
[37]. This formulation extends certain concepts he had applied earlier’® to the 
parametric case. 

In this last section we shall permit ourselves a somewhat more abstract ter- 
minology and notation than before. As in section 2, E = (X,,---, X,) will 
denote the sample; F,,(Z), its c.d.f.; W, the n-dimensional Euclidean space of E, 
the sample space; and Q, the space of admissible F,,. Of central importance is 
a given class © appropriate to the problem, GS = {wg}, whose members wg are 
(not necessarily disjoint) subsets of 2,2 = Usws. To every ws eS there corre- 
sponds a hypothesis H (wg): F, € wg , So that there is a 1:1 correspondence between 
the members of the set © and those of the set {H(wg)} of hypotheses. The 
general problem of statistical inference, according to Wald, is the choice of a 
decision function A(Z) mapping W into S. For every E « W a decision function 


A(EZ) uniquely selects an element ws of S, ws = A(E). Its statistical import is 
that when the sample point E equals E, we agree to accept the hypothesis H (we) 
determined by A(E) = az. 

Before introducing any further definitions let us illustrate the preceding ones. 
In any problem of testing a hypothesis, the set © has just two members w and 
we, which we have heretofore denoted by w and 2 — w, respectively. The de- 
cision function A(Z) then takes on just these two values, in fact, A(E) = we 


for E in the critical region w of the test, and A(Z) = wifor Ee W — w. 

To illustrate the definitions in the case of point estimation, consider estimating 
the median M of a univariate population with c.d.f. F(z). @ would be the class 
of F,, of the form Thus F(z;) with, say, F ¢Q,and F’(M) ¥ 0 (which is sufficient 

‘to insure a unique M). The index 8 could now be identified with M, so that its 
domain is the real line, and wy = {F,| M(F) = 8}. The classes ws would be 
disjoint in this case and each would contain an infinite number of F,. The 
problem of estimating the unknown M may be said to be the choice of a decision 
function A(Z): When E = E we accept H(ws): Fn €ws = A(E), meaning in this 
case simply that we accept the statement that M equals the 6 determined by 
A(E). 


16 A. Wald, ‘“‘Contributions to the theory of statistical estimation and testing hypoth- 
eses’’, Annals of Math. Stat., Vol. 10 (1939), pp. 299-326. 
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Suppose next that instead of the point estimation of M just discussed we are 
interested in the interval estimation of M. We define 2 as above, and now take 
the index 6 to consist of a pair a, b of real numbers. An interval estimate a < 
M < b may be regarded as an acceptance of the hypothesis H(wa,s): Fn € wa, , 
where wa,z is the subclass of 2 consisting of all F,, for which M (F) lies in the inter- 
vala < M<b. Theset S now consists of all classes w,,, with -~ <a< 
b< +o. Here as in the general case of interval estimation the classes ws of 
the set S are not disjoint. The decision function A(Z) adopted in section 8 is 
A(E) = we,» with a = z, b = Zn_z41, Where 2: < 2 < +--+ < 2, is a rearrange- 
ment of the coordinates 2, ---+ , Zn of E. 

An example of a problem neither of estimation nor testing would ‘be the fol- 
lowing: Let 2 be as above. Two real numbers A and B (A < B) are given and 
it is required to decide on the basis of the sample E to which of the three classes 
—-2o7 <M<A,A<M<B,B<M < + the unknown median M belongs. 
Here the set S would consist of three disjoint classes w, , w:, w3 : where « is 
the subclass of 2 consisting of F,, with M(F) < A, ete. 

We return now to the general case. Before defining a “best’”’ decision func- 
tion A = A*, Wald asks that there be a given weight function w(F, , ws) defined 
on the product space 2 X ©. The weight function w(F, , wg) is a real-valued 
function evaluating the loss involved in accepting H (wg), the statement that the 
unknown c.d.f. of E is a member of wg , when the unknown c.d_f. is actually F, . 
If F,, € ws we make no error in accepting H(wg), and in this case w is defined to 
be zero. Its value otherwise is required to be non-negative. In this theory the 
choice of the weight function is regarded as essentially not a mathematical prob- 
lem, but the choice is to stem out of the very specific situation in which the 
statistical inference is to be made. In an industrial problem w might be the 
financial loss incurred when a certain kind of error is made. 

After w is given, the decision functions A are to be restricted to the class for 
which w(F, , A(F)) is a Borel-measurable function of E for all F,, € 2; note that 
w depends on £ only through A, not through F,. The expected value of w 
for a particular F,, is called the risk function; it depends of course on the decision 
function A and the weight function w as well ason F,. Denote it by 


(A, w|F,) = [ w(F,, A(B)) dFa(B). 


Since the true F,, is unknown, so in general will be the true value of the risk 
function associated with a particular decision function A. We might call 


r(A, ») = sup 7r(A, w| F,) 
Fre 
the maximum risk associated with the decision function A. Wald defines A* 


to be the “best” decision function relative to the weight function w if the maxi- 
‘mum risk r(A, ) is minimum for A = A*. He points out that the “best” decision 
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function might be defined as one which minimizes some weighted mean, taken 
over all F,, € Q, of the risk function r(A, w | F,), but that the above definition of 
the “‘best’’ decision function has certain advantages. Thus under certain restric- 
tions on 2 and w, the risk function r(A*, w | F,,) is independent of F,, ¢ Q, that is, 
we then know the exact value of the risk, regardless of what the true F,, may be. 
This is analogous to the desirable situations where confidence intervals are 
known, and the probability of a false statement (to the effect that the unknown 
quantity is in a given region when it is not) is then a constant independent of 
the unknown quantity. 

Wald’s theory is suggestive and formally very satisfying, but one would like 
to see some specific examples of its application to non-parametric cases. A 
discouraging aspect, not shared by the older Neyman-Pearson theory, lies in the 
very refinement that a decision function is declared best with respect to a very 
particular weight function m. An attractive possibility would be to impose a 
metric on 2 or on a related function space, and to let w be the distance function. 
In the problem of two samples for example, after metrizing Q, , the weight 
assigned to accepting H might be taken as the distance between F and G in the 
notation of section 4. A suitable choice of metric might yield a weight function 
appropriate to a large variety of situations. The difficulties of finding adistance 
function which is intuitively satisfactory and analytically tractable in calculat- 
ing the risk function are no doubt formidable. The device of metrizing a space 
of distribution functions was used by Mann and Wald in a different connection 
[17], but their choice of distance function, while appropriate to their problem, 
would not be satisfactory here. 

Also still lacking is any general theory relating the three concepts discussed in 
Part III. The following questions have been answered, at least for some specific 
examples, in the parametric case, but are still untouched in the non-parametric 
case: Are likelihood ratio tests consistent? Is there a simple weight function 
relative to which the likelihood ratio test becomes a “‘best”’ test, or asymptoti- 
cally a “best” test? Ifa test is “best’’ relative to a given weight function, with 
respect to what set of alternatives is it consistent? ; 

In conclusion let us emphasize the need for constructive methods of obtaining 
“good” and “best” tests and estimates in the non-parametric case. Recalling 
the history of the parametric case we may judge that half the battle was the 
definition of “good” and “best” statistical inference. Progress in the non- 
parametric case has been made in the direction of definition, mainly by carrying 
over or modifying criteria originally advanced for the parametric case. How- 
ever, besides criteria for ‘good’ and “‘best”’ tests and estimates, we have in the 
parametric case a large body of constructive theory which may be applied in 
particular examples to yield the optimum tests or estimates; thus we have the 
Fisher theory of maximum likelihood statistics for point estimation, and the con- 
structive theorems of the Neyman-Pearson theory for the existence of critical 
regions of types A, Ai, B, B,, and the related types of “‘best’’ confidence inter- 
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vals. The contrasting lack of any general constructive methods’® at present 
challenges us in the non-parametric theory. 
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ON THE THEORY OF SAMPLING FROM FINITE POPULATIONS 


By Morris H. HANSEN AND WILLIAM N. Hurwitz 


Bureau of the Census 


I—HISTORICAL BASIS FOR MODERN SAMPLING THEORY 


The theory for independent random sampling of elements from a population } 
where the unit of sampling and the unit of analysis coincide was developed by 
Bernoulli more than 200 years ago. The theory that would measure the gains 
to be had from introducing stratification into sampling was indicated by Poisson 
a century later. Subsequently, Lexis systematized previous work and provided \ 
the theoretical basis for sampling clusters of elements." The adaptation of the 
work of Bernoulli and Poisson to sampling from finite populations was sum- 
marized by Bowley in 1926 [1] approximately a century after the work of Poisson. 

An impetus to sampling advancement, following some fundamental statistical 
contributions of Pearson, Fisher, and others, resulted from the work of Neyman 
when he published his paper in 1934 on the two different aspects of the repre- 
sentative method [8]. In that paper he introduced new criteria of the optimum 
use of resources in sampling, including the concept of optimum allocation of 
sampling units to different strata subject to the restriction that the sample have 
a fixed total number of sampling units. 

If, no matter how a sample be drawn, the cost were dependent entirely on the 
number of elements included in the sample, there would be little need for theory 
beyond the classical theories of Bernoulli and Poisson covering the independent 
random sampling of elements within strata, supplemented by the extension of 
the theory to finite populations, and the extension to optimum allocation of 
sampling units. Very often, however, in statistical investigations it is extremely 
costly, if not impossible, to carry out a plan of independent random sampling 
of elements in a population. Such sampling, in practice, requires that a listing 
identifying all the elements of the population be available, and frequently this 
listing does not exist or is too expensive to get. Even if such a listing is avail- 
able, the enumeration costs may be excessive if the sample is too widespread. 
Frequently also, there are other restrictions on the sample design, such as the 
requirement that enumerators work under the close supervision of a limited 
number of supervisors, and as a consequence the field operations must be confined 
to a limited number of administrative centers. ‘Techniques such as cluster 
sampling [2, 3, 4, 5, 6, 7, 8, 10], subsampling, and double sampling [9], have been 














































1 The sampling of clusters of elements refers to the sampling of units that contain more 
than one element. Examples of cluster sampling include the use of the city block or the 
county as the sampling unit when the purpose of the survey is to determine the properties 
of the population made up of individual persons or individual households. In these in- 
stances, the city block or county is referred to as the cluster of elements, and the individual 
person or household is referred to as the element. 
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developed with the aim of making most effective use of available resources, while 
keeping within existing administrative restrictions, and thus producing the maxi- 
mum amount of information possible within these resources and restrictions. 
Neyman [8], Yates and Zacopanay [10], Cochran [2], Mahalanobis [7], and others 
have made important contributions in this regard. 

We can illustrate a number of the developments indicated above in a simple 
but fairly general subsampling design. This design involves the sampling of 
clusters of elements from a stratified population and the subsampling of elements 
from each of the selected clusters, where the number of elements in each of the 
primary sampling units within a stratum is the same. 

Suppose we have a population made up of L strata, with the 7-th stratum con- 
taining M/; primary sampling units of N; elements each. The individual element 
will be the subsampling unit. Let X;; be the value of some characteristic of the 
k-th element of the j-th primary sampling unit in the 7-th stratum, and assume 
that the character to be estimated is 


Me Nig 


(1) -> a dX Xw/ M; N;. 


For example, if X is the average income per household in a given city, Xx, might 
be the income of the k-th household in the j-th city block in the 7-th ward; 
where the household is the subsampling unit, the city block is the primary 
sampling unit, and the stratification has been by wards. Suppose, further, that 
we sample m; primary units from the 7-th stratum, and subsample n; elements 
from each of the primary units sampled from that stratum. 

The “best linear unbiased estimate” [8] of X from the sample will be 


(2) ae - =) > > Xeal> MiNi, 
+ afte 7 


and the variance of X’ is 
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where X;; = > Xin/N; and X; = 2d » Xine/MiN;. 


These formulas have no practical utility in designing samples unless there are, 
in addition, some considerations of differential costs. Cost relationships some- 
times may be stated explicitly as a function of the*m; and the n;, or, what is 
frequently the case, they may be approximated sufficiently through intuition 
and speculation to guide one to a reasonable decision among the various alter- 
natives implied by the design. 
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If we know the cost function we proceed to determine the values of the m; and 
the n; that make o°;- a minimum for a fixed total expenditure, and also subject 
to any other restrictions that may be imposed. This theory provides a basis for 
determining the optimum allocation of the sampling ratios to the various strata, 
and to primary and secondary sampling units within each stratum. 

Such developments, however, must be regarded as only the first step in sample 
design. We cannot go forward if we only know that the optimum sample design 
is some particular mathematical function of the population parameters and the 
cost factors; we need also to know something about the relative magnitudes of 
certain parameters in the particular populations under consideration, as well as 
something about the costs associated with the various sampling and estimating 
operations. 

Thus, considerable work in recent years has been done on the study of the 
relative magnitudes of variances and covariances between and within various 
types of sampling units and on the study of costs and types of cost functions 
that operate. Work is being done in this field by the Department of Agriculture 
in connection with sampling for agricultural items, and is being done also in the 
Bureau of the Census, and in other places. 


II—THE DIRECTION OF MORE RECENT DEVELOPMENTS 


The sampling procedure indicated above involves as a first step the definition 
of the system of sampling, such as whether the sampling method will involve 
cluster sampling, double sampling, or subsampling, and along with this the 
definition of the stratification and the sampling units. The second step is that 
of determining the method of estimation, together with the allocation of the sam- 
pling units. 

The first step, that of defining the sampling system is taken with a view to 
administrative feasibility and sampling efficiency, but no simple procedure exists 
which leads one uniquely to the selection of a system except perhaps by the 
impractical method of listing and examining all possible alternatives and accept- 
ing one on some criterion of best. However, given the definition of a population 
character to be estimated, and a sampling system, a simple procedure is available 
that will provide a unique solution to the second step providing we accept some 
criterion as to what ‘“‘best’’? means, such as the best linear unbiased estimate, 
subject to any cost or administrative restrictions that may be imposed. Such 
criteria lead us to both our estimating procedure and our allocation of sampling 
within the sampling system defined. 

While no theory with practical applicability has been developed which indi- 
cates a “‘best”’ system of sampling, and at the same time indicates the ‘‘best”’ 
estimating procedure and sampling allocation, some progress in the choice of 
improved sampling systems and estimating procedures has been made. The 
developments in the following two directions appear to us to be particularly 
pertinent. 

1. Modifications in some of the fairly generally accepted criteria of good 
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sample estimates have led to more reliable sample results for some types of 
sampling systems (some of these are mentioned in Sec. III); 

. Some principles are emerging, that have led to improved determination of 
the sampling units, the strata, and other aspects of the sampling system 
(some efforts at formulating such principles are reported in Secs. IV, V, and 
VI). 

We shall summarize, principally, some of the recent work in the Census—and 
in so doing shall mention some work of others that is closely related. Most of 
the work that we shall summarize relates to problems where the sampling units 
are clusters of elements and vary in size. 


III—MODIFICATIONS IN THE CRITERIA FOR GOOD ESTIMATES 


The estimate given in the general subsampling problem formulated in Sec. I 
satisfies the criterion of the ‘‘best linear unbiased estimate.’’ Also, as far as our 
experience has indicated, this estimate is frequently the most efficient one for 
populations of the form described, that is, where the number of elements in each 
sampling unit within a stratum is the same. However, if the numbers of ele- 
ments differ between sampling units, a biased but consistent estimate can fre- 
quently be found that has a substantially smaller mean square error’ than the 
best linear unbiased estimate. 

For example, consider the case where clusters of elements are the sampling units 

M M 


and we want to estimate X = Le X;/ pa N;, the average value per element 


of some specified characteristic. Here M is the number of sampling units in the 

population, X; is the aggregate value of the specified character for all elements 

in the 7-th cluster, and N; is the number of elements in that cluster. The joint 
M 


distribution of X; and N; is unknown, but ae N; = Nisknown. Under these 


circumstances the “best linear unbiased estimate’ of X from a sample of m 


M< 
clusters turns out to be = hy X;/N. However, a smaller mean square error is 


often obtained by the use of a ratio estimate from the sample such as 
> xX JX N;. This estimate is excluded by the “best linear unbiased” cri- 


cule hone it is nonlinear and biased, although the bias is usually negligible 
and the estimate is consistent. . Since the best linear unbiased estimate of X 
requires the knowledge of N, the sample ratio has a further advantage in that 
it can be used even when N is net known. 


A recent paper by Cochran [3] gives a number of consistent though biased esti- 


2 In this paper the terms ‘‘mean square error’ ’ and ‘‘variance”’ are used interchangeably 
to refer to E(X — X)? when EX is equal to X, the population character to be estimated. 
When EXxi is not equal to X, however, E(X — X)? will be referred to only as the ‘mean square 
error.”’? Since, under these latter circumstances, E(X — X)? = E(X — EX)? + (EX — X)}, 
the mean square error is equal to the variance of X plus the contribution due to the bias. 





SAMPLING FROM FINITE POPULATIONS 337 


mates of X that make use of the least square estimate of the linear regression of 
X,;on N;. These estimates generally have a smaller mean square error than 
either the best unbiased linear estimate or the simple ratio estimate given above. 
However, they require knowledge of N, as does the best linear unbiased estimate, 
and in addition may require detailed tabulations and considerable clerical work 
as a part of the estimating process. 

Both types of biased estimates mentioned above are consis‘ent, and usually 
have a smaller mean square error than the best linear unbiased estimate for 
sampling systems in which the sampling units vary in size. Thus, improved 
sample estimates will be obtained by modifying the ‘best linear unbiased 
estimate’’ criterion to include estimates that are nonlinear, consistent, but have 
a smaller mean square error than the best linear unbiased estimate. 


IV—IMPROVEMENTS IN THE SPECIFICATIONS OF 
SAMPLING SYSTEMS 


A great deal can be done to improve sampling designs through improved speci- 
fication of the sampling system even though one has only a limited knowledge of 
the manner in which the population is likely to be made up, and no specific 


information concerning the particular population parameters involved (see 
Sec. VI). 


1. The sizes of sampling units. A number of recent investigations have 
indicated the desirability, with costs considered, of keeping the size of cluster 


very small when clusters of elements are used as the sampling unit in field sur- 
veys [2, 5, 6, 7, 8]. It is important to point out, however, that this principle is 
not necessarily applicable to subsampling systems, and that the use of large 
clusters as the primary sampling units in a system involving subsampling may 
yield distinct gains over the use of smaller clusters without subsampling. More- 
over, one of the often recurring problems in large-scale studies is the designing of 
sample surveys within stringent administrative restrictions on the number of 
different locations in which operations can be carried on. Under such restric- 
tions a procedure commonly used is to choose a limited number of existing 
political units, such as counties, as the primary sampling units, and then to sub- 
sample units such as blocks, small rural areas, or households. Under the circum- 
stances, if the numbers of primary subsampling units to be included in the 
sample are assumed to be held constant, the use of larger primary sampling units 
than the existing political units would have the effect of decreasing the sampling 
variance. 

The advantage of using large primary units in subsampling is evident in the 
simple case when the original units, each having the same number of elements, 
are consolidated to form half as many enlarged primary units, each twice as large 
as the original units. The variance between the enlarged primary units will be 
on = 4gi,(1 + p), where oy is the variance between the original primary units, 
and p is the correlation between the units that are paired. The correlation coeffi- 
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cient will be close to zero (exactly equal to —1/{M — 1}, where M is the number 
of original primary units) if the pairing is done at random, and it follows that the 
variance between counties is then cut at least in half. Ordinarily, p will be 
greater than zero if the paired units are required to be contiguous. However, 
through choosing for consolidation those contiguous units that are as different 
as possible, p is made as small as possible, and in some instances this minimum 
value may even be negative. In any event, the smaller the value that p takes on, 
the greater the reduction of the sampling variance between primary units from 
the use of enlarged units. While the sampling variance within primary units is 
increased by such consolidations, the increase is slight, and the total sampling 
variance is almost invariably decreased (see Appendix, Section 1). 

The restriction on extending the consolidation of primarv units is introduced by 
the increased cost of subsampling within larger and large: areas. This increased 
cost is to be weighed against the decreased variance. If the cost restriction 
were not sufficiently severe, consolidation would proceed to the point of eliminat- 
ing the use of primary sampling units altogether, and the subsampling units 
would be selected independently throughout the entire stratum. 


2. Subsampling where the primary units are of unequal size. Use of proba- 
bility proportionate to size in subsampling. A subsampling system frequently 
followed, whether or not the primary sampling units vary in size, involves the 
selection of one or more primary units from each stratum with the probability 
of selection the same for each primary unit in the stratum, and the subsampling 
of a fixed proportion of the subsampling units from the selected primary unit. 
When the primary units vary in size this subsampling system has some ad- 
ministrative disadvantages that arise because the number of subsampling units 
to be included in the sample will vary with the number of elements in the se- 
lected primary unit. (The term “size” of sampling unit as used in this paper 
refers to the number of elements in the sampling unit.) 

The disadvantages in the above system have led in some instances to the speci- 
fication of a second subsampling system in which, although the primary units 
were selected with equal probability, the subsampling has been of a constant 
number rather than of a constant proportion. 

A third subsampling system that can be recommended over both the above 
systems is to make the probability of selection of a primary unit proportionate 
to its size and then to subsample a constant number of subsampling units. 

We shall assume that for all three systems only one primary unit is selected 
from each stratum. Stratification to this degree leads to a smaller sampling 
variance than does less extensive stratification. For simplicity in making com- 
parisons, we shall assume, furthermore, that the subsampling unit is the element 
of analysis and that the sample estimate used is of the form X’ = =N,X »/=Nn 
where X;, is the sample average, for the h-th stratum, of the character being 
estimated, and N; is the size of that stratum. This estimate, which is frequently 
used, is biased for the first two systems but unbiased for the recommended sys- 
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tem. However, an unbiased estimate, say the ‘‘best”’ linear unbiased estimate 
for the first two systems generally has a much larger mean square error than the 
biased estimates used in these comparisons and hence has not been considered in 
the comparisons which follow (see Sec. VII, footnote 7). 

The first two subsampling systems mentioned are about equally efficient when 
the number of subsampling units drawn from each primary unit is reasonably 
large, but each will usually have a larger mean square error than will the recom- 
mended system. The difference between the mean square errors of either of the 
first two and that of the recommended design is given approximately by 


1 - " 
(4) we 2 On Naot [> pr, Nn — 2s prio] 


where, within the h-th stratum, N;; is the number of elements in the j-th primary 
sampling unit, N;, is the average size of primary sampling unit, Q, is the number 
of primary sampling units, p,; is the intra-class correlation between elements 
within the j-th unit and o;, is the variance between individual elements within 
the stratum; L is the number of strata. (See Section 2 of the Appendix for the 
development of this difference.) 

This difference, which is a multiple of the average covariance between the 
N,; and p,;, will be positive if N,; and p,; are negatively correlated, and this is 
exactly the situation that exists in most practical problems we have encountered 
in sampling for social and economic statistics (see Sec. VI). 

The reduction in the mean square error arises because the recommended de- 
sign provides a more nearly optimum allocation of sampling as between large 
and small sampling units than do the other two. It might be possible, of course, 
as another alternative, to stratify the primary units by size and then allocate 
sampling to the various strata on the basis of optimum sampling considerations. 
However, this would mean that some other and perhaps more important modes 
of stratification would be sacrificed, and moreover, the optimum allocation of 
sampling between the larger and smaller units could only be guessed at in most 
practical problems. Furthermore, it usually is not possible to stratify on size 
to the point that there is no variation in the sizes of units within a stratum. 

The sample estimate from the recommended system is unbiased whereas the 
estimates from the other two are usually biased, and sometimes fairly seriously 
so. (For a proof of this statement see Appendix, Section 1, and see also Sec. 
VII for a numerical illustration.) 

The use of probability proportionate to size serves to decrease only the sam- 
pling variation between primary units and has very little effect on the sam- 
pling variance within. Therefore, the recommended design shows its greatest 
advantage over the two alternatives when the contribution of the mean square 
error between primary units to the total mean square error is large. 

Ordinarily, the actual sizes of the primary sampling units will not be known, 
but numbers may be known that are highly correlated with the sizes. For 
example, ordinarily we will not know the populations of blocks or of cities or 
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counties at the time a sample is taken, but we may know their populations at the 
preceding census. Under these circumstances the primary units may be sampled 
with probabilities proportionate to the previously known (or their estimated) 
sizes, but if this is done the subsampling is to be modified in order to take account 
of the changes in the sizes between the two dates. If the actual sizes are known, 
the constant number taken from the selected primary unit in the h-th stratum is 
Nn = tr,Nn where ¢t, is the sampling ratio assigned to the stratum, and N,j, is the 
total number of elements in the stratum. The subsampling ratio within the 
selected primary unit, therefore, is t,Ni/Ni;, where N;,; is the number of ele- 
ments in the selected unit. On the other hand, if there is available only a meas- 
ure of size P;,; highly correlated with the actual sizes of the units N;,; and, if the 
probability of selection of the primary unit has been proportionate to the P,; 
the subsampling ratio in the selected primary unit will be equal to t,P:/P;;, 
where P, is the measure of size of the entire stratum, and P,,; is the measure of 
size of the selected primary unit. The variance of a sample estimate where 
measures of size are used is given subsequently in this paper (see Eq. (9)). 


3. The use of area substratification within primary strata in a subsampling 
system. Another modification, which will be called area substratification 
within primary strata, may be particularly useful where a relatively small sample 
is required from a population covering a large area, and where operations must 
be confined to a limited number of centers. 

Some preliminary remarks are necessary before area substratification can be 
explained. Area substratification requires (a) that the entire population to be 
sampled be divided into areas that will serve as primary sampling units; (b) that 
these units be further subdivided into a number of sub-areas; and (c) that certain 
summary statistical information be available for each of the sub-areas in advance 
of drawing the sample. The information that must be known for the sub-areas 
includes a reasonably good measure of their sizes (perhaps the total population, 
total dwelling units, or total farms) and other information which is indicative of 
the characteristics of the area, such as whether predominantly farm or‘nonfarm, 
predominantly white or colored, etc. The sub-areas, when grouped into homo- 
geneous classes, will serve only to determine the substrata described subse- 
quently, and will not ordinarily serve as the subsampling units, which may be 
defined independent of the sub-areas. 

The definition of the primary’sampling units and the classification of them 
into strata proceed as indicated earlier, with the primary units made as internally 
heterogeneous as possible within strata that are as homogeneous as possible. It 
will be assumed that only one primary unit is sampled from each stratum, and 
that the probability of selecting the j-th primary unit within the h-th stratum is 
proportionate to P,;, where P,; is the measure of size of the primary unit and is 
equal to the sum of the measures of size of the sub-areas that it contains. It will 
be assumed, also, that t, , the over-all sampling ratio to be used within the h-th 
stratum, has been determined for all strata on the basis of considerations of 
optimum allocation. 
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The introduction of area substratification within primary strata may then be 
accomplished as follows: 5 

(a) The sub-areas within each primary stratum are classified into substrata 
on the basis of their characteristics. (For example, they may be classified 
into predominantly farm and predominantly nonfarm sub-areas, and 
these further classified on the basis of the average size of farm or average 
rental value of the dwelling units. In such a case, the sub-areas within 
the primary stratum that are predominantly farm and that have average 
rental values lying within a specified interval constitute a substratum.) 
The sub-areas within the primary unit selected from each primary stratum 
are classified into the same substrata. 
Subsampling units are defined within each of the substrata within the 
selected primary units. The number of subsampling units defined within 
that part of the 7-th substratum that is contained within the j-th primary 
unit is denoted by Mai;. (Various types of subsampling units may be 
defined, such as the individual person, farm, dwelling unit, or structure, a 
very small area, etc. The subsampling units need be defined only within 
the selected primary sampling units.) 
The number of subsampling units to be included in the sample from the 
7-th substratum within the selected (j-th) primary sampling unit is 


(5) maij = MaijtnPnri/ Pri; , 


where P,,;; is the measure of size of that part of the 7-th substratum that 
lies within the j-th primary unit, and P,; = 3 Pi; is the sum of the 


7 
measures of size of the sub-areas contained in the 7-th substratum of the 
h-th primary stratum. This method of allocating the subsampling pro- 
vides that the subsample drawn from the selected primary unit is repre- 
sentative, so far as possible, of the entire stratum, rather than of the par- 
ticular primary unit that happens to be included in the sample from that 
stratum. To illustrate, suppose the numbers of persons in sub-areas from 
the 1940 census are used as the measures of their sizes, and that the sub- 
areas are classified into substrata on the basis of their characteristics in 
1940 as indicated by the 1940 Decennial Census of Population. The 
allocation of the subsampling indicated above then provides that if the 
proportion of the total population residing in sub-areas that are pre- 
dominantly farm is 30 percent, the sample will be drawn in such a manner 
that 30 percent of the 1940 population expected in the sample would be 
from the predominantly farm sub-areas, even though, in the selected 
primary sampling unit, perhaps only 15 percent of the 1940 population 
might reside in such areas. 
(e) The population character to be estimated is 
Sh Qh Mhsj 


L Sh 
(6) X = X z x Xhiik y 


. 3 
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where X),; ;, 1s the aggregate value of a specified characteristic for all of the 
elements contained within the k-th subsampling unit in the 7-th substra- 
tum of the j-th primary unit; S, is the number of substrata and Q, is the 
number of primary units in the h-th primary stratum; and L is the number 
of primary strata. (X might be the total number of workers in the 
United States, or the total number of farm laborers, etc.) An estimate of 
X from the sample is 


Sh Mhij 


L 
XxX’ = X 1/tr ke pa Niiike 


No summation over j is involved, because only one primary unit is drawn 
from the h-th stratum. This is a very simple estimate, involving a sum 
weighted only at the primary strata level. If the ¢ are all set equal to 
t, i.e., if a constant proportion is sampled from each stratum, the estimate 
becomes merely the total number of elements in the sample having the 
specified characteristic multiplied by 1/t, the reciprocal of the sampling 
ratio. 

The allocation of the subsampling indicated above may be deviated from and 
the controls of area substratification can still be maintained if proper modifica- 
tions are made in the sample estimate. In this event, differential weighting 
must be introduced at the substrata level rather than only for the primary strata. 

The definition of heterogeneous primary sampling units, the proper classifica- 
tion of them into strata, and the use of probabilities proportionate to the meas- 
ures of size in the selection of the primary units are particularly desirable if area 
substratification is used. If these are not introduced the likelihood of making 
substantial gains through the use of area substratification is decreased. The 
definition of the primary strata should be made in conjunction with the definition 
of the substrata, and should insure that each primary unit has adequate repre- 
sentation of each substratum that is to be defined within that primary stratum. 
With this restriction observed, the number of significant substrata that can be 
defined will be limited by the heterogeneity of the primary units. Thus, in 
order to provide for substratification into predominantly farm and predomi- 
nantly nonfarm areas, the primary sampling units should be defined so that both 
farm and nonfarm areas are represented in each unit. This procedure not only 
makes area substratification more effective, but improves the efficiency of the 
sample in making separate estimates for such classes of the population. How- 
ever, if this procedure cannot be adhered to exactly in practice, primary units in 
which certain of the substrata are not represented will occasionally come into the 
sample. One alternative when this occurs is to combine certain substrata; 
another is to exclude such primary units from the sample. 

Since the number of primary strata is restricted by the number of primary 
units to be sampled, it is wasteful to set up strata at the primary level with re- 
spect to sources of variation that can be controlled adequately through area 





SAMPLING FROM FINITE POPULATIONS 343 


substratification. For example, if farm areas and nonfarm areas are to be dis- 
tinguished in the substrata, the primary strata should not be exhausted by classi- 
fying the primary units into a large number of strata by percent farm (percent 
of total population in primary unit living on farms), since the effect of the sub- 
stratification is to control the variation in the percentage farm. Limiting the 
number of percentage farm classes at the primary level makes possible the use 
of other modes of stratification that will control on farm:type, or on the indus- 
trial character of the nonfarm population, or on some other similar criteria. 

Area substratification is to be distinguished from the fairly commonly used 
method of specifving the number of elements to come into the sample from each 
of several different classes of elements—whether such quotas are fixed to make the 
sample correspond with the specified characteristics of the entire primary stra- 
tum or of the selected primary sampling unit. The method of fixing quotas and 
instructing interviewers or enumerators to obtain a given number of elements 
(persons, dwelling units, farms, voters, etc.) having various specified charac- 
teristics has a fundamental weakness that is avoided in area substratification 
within primary strata. Such quotas ordinarily must be set on the basis of pre- 
vious information or rough estimates, and thus cannot accurately reveal chang- 
ing characteristics of the population. Area substratification, on the other hand, 
uses previous information to insure the proper representation of various types of 
areas in the sample. The numbers of elements obtained with various specified 
characteristics are determined from the population as it is, and not as it was at 
some previous date. In times of rapid change the fixing of quotas on the basis of 
previous information may introduce increasingly serious biases. 

The gain from using previously available information in stratifying areas 
arises from the fact that there is a high correlation in the characteristic of an 
area from time to time over a period of several years. An area that is pre- 
dominantly farm at one date ordinarily will be predominantly farm a few years 
later. Similarly, while very substantial shifts in population may occur, the num- 
bers of persons in a set of areas at one time ordinarily will be very highly corre- 
lated with the numbers a few years later. However, area substratification does 
not depend on the fact that no shifts occur. If shifts have occurred it will 
measure them. If the shifts have been sufficient to completely alter the charac- 
ter of most small areas, it will still provide estimates revealing the changing 


character of the population, but under these circumstances the efficiency of the 
method is decreased. 


V—EXPECTED VALUES AND VARIANCES FOR THE SUBSAMPLING 
SYSTEM INCORPORATING THE PRINCIPLES OUTLINED ABOVE 


The system of sampling incorporating the principles of enlarged primary 
units, the selection of primary units with probabilities proportionate to the 
measures of size and area substratification will be examined more fully below. 
It will be referred to, for convenience, as the specified subsampling system. 
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1. The expected value of an estimated total for the specified subsampling 
system. All summations in the formulas below are over the population unless 
otherwise indicated. The expected value of X’ as defined in Eq. (7) is 


EX’ = » 2X 2X a (1/tn)(Paj/Pr)(mnij/Mnis) Xniix- 
From (5) th = mrijsPnij/MnijPri, and therefore 
EX’ = dX Xu x x (Pri/Prij)(Pri/Pr)Xni iz 
= X Py X dX (Pxi/Px)(Prs/Px)(Xnii/Pris) = 20 PrRnay 
where 
P, = dX Piri = 2d Pr3s Rua) = » (Pni/Pr)Rnicay; 
Raja) = dX (Pai/Pr)Raszs and Rasy = Qy Xnin/Pasz = Xnij/Pris 


k 
The R54) will be referred to as the adjusted ratio for the j-th primary unit. 
It is the weighted average within the j-th unit of the substrata ratios, R,;;, 
where the same set of weights P); is applied to the R,;; in each primary unit 
within a stratum. The Ry,4) is the average, within the h-th stratum of the 
adjusted ratios. Hence 


(8) EX’'’=X+ » Pi(Racay — Ra), 


where 


R, = Xn/Pr, with X;, = DD Xnuss 
+ 2 


is the ratio of the aggregate value of the specified characteristic for the elements 
in the h-th stratum to the measure of size of that stratum, and where the popula- 
tion character being estimated (6), is equal to X = 2X, = =P,R,. 

From (8), it is seen that X’ is a biased estimate of X, although ordinarily, in 
practice, only slightly so. The bias, equal to =P:(Raca) — Ra), is the sum of the 
biases for the various primary strata. Under many practical circumstances 
some of these will be slightly negative and some slightly positive, with the result 
that the total bias will be relatively small. The bias would be nonexistent if 
area substratification were not used, or if the form of the sample estimate were 
properly modified, but here again, as in the case of substituting ‘biased for un- 
biased estimates discussed in Sec. III, the-introduction of a slight bias may result 
in a substantial reduction in the variance. 

A sufficient, although not necessary, condition for the sample estimate (7) 
with area substratification to be unbiased is for the ratios P,:;/P,; to be un- 
correlated with the R,;; within each substratum. Under these circumstances 

> Pry Prij mn Pay Prsj > P 


hj Pi Pr; 
-— a at Gag, = Se Po Be 
j Pi Pr; | fa fu “3 Pi ” P, > P, 
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and therefore 


” iy Lp. ze a 7 ag = a L. 7 a ~ Ris = Ria) 


To illustrate, if the measures of size are the 1940 oneal then the sample 
estimate will be unbiased if the proportions of the 1940 populations of the pri- 
mary sampling units that are in the various substrata are uncorrelated with the 
corresponding F,;;. As indicated earlier these conditions are approximated in 
many practical problems, especially if the primary stratification has been carried 
out effectively. Moreover, if the conditions are not met approximately, the 
bias introduced may still be very small. (See Sec. VII for a numerical illus- 
tration.) 


2. The mean square error of an estimated total for the specified subsampling 
system. For the development of the mean square error of X’ for the specified 
subsampling system, see the Appendix, Section 2. There it is shown that the 
mean square error of X’ is 


Pry Mnig — nz oss 
= Pi; - i j i 
@) ox » x x Pi Mn ~~, Mrnij P haf 


+ x Pi » x (Raja) — Rua) + (20 Pa(Rncay — Rad] 
7 


where 
ony = x (Xnsse = Xu) [Misi 


is the variance between subsampling units within a substratum of the aggregate 
value of a specified characteristic for the subsampling unit and 


Pus; = Pri j/Mnij 


is the average measure of size of the subsampling units in the h-7-j-th area. 

The first term of (9) is the contribution of the variance between subsampling 
units and may be kept small by proper definition of the subsampling units, and, 
of course, by increasing the subsampling ratio. The second term of (9) is the 
contribution of the variance between primary sampling units within strata; 
and the third term is the contribution of the bias, which, as indicated before, 
ordinarily will be of negligible size, so that the mean square error and the vari- 
ance will be approximately equal. 

It is the variance between primary sampling units that contributes most 
heavily to the total variance in many subsampling situations, and it is on this 
contribution that the modifications proposed in this paper have their principal 
effect. The effect of area substratification is seen by comparing the variance 
between primary units given above with that obtained if area substratification 
were not used but other aspects of the design remained unchanged. In this 
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event the variance between primary units involves the variance of the ratio, 
Ri; = >> Xnij/Pa; = Xnj/Pn;, instead of the variance of the adjusted ratio, 
Rr jcay - 


The relationship between the variance of R,; and that of Ry 54) within the 
h-th primary stratum is given by 


2 2 2 
(10) ORG = FRaj(A) + ORhrj—Rnj(A) + 2 O Ry; (4) TRaj—Rnj(A) ’ 


where of,;—R,;:4) 18 the variance of the difference between the adjusted and the 
unadjusted ratios, and p is the correlation between the adjusted ratio and the 
amount of the adjustment. Thus, if the correlation is near zero or positive, 
there will be a gain from the introduction of area substratification, although there 
may be a loss if the correlation is highly negative. Essentially, the condition 
for p being equal to or near zero is the same as that for the sample estimate being 
unbiased; namely, that the P,:;/P:; be uncorrelated or only slightly correlated 
with the R;;; within each substratum.’ 

The variance of R54) rather than that of R,; occurs in the variance of X’ 
because the subsampling numbers were allocated proportionate to the P,,, 
no matter what primary sampling unit happened to be selected for inclusion in 
the sample. The ratio R,; like Rp j.4) may be regarded as the weighted average 
of the R,;; but with the weights equal to P),;; instead of P,;, and thus varying 
from primary unit to primary unit. It would appear, therefore, from the rela- 
tionship of the variances given above, that if the substrata are effective, and if 
the P);; are highly correlated with the actual sizes of the substrata, the weighted 
average using fixed weights in all primary units should have a considerably 
smaller variance than that using variable weights. This turns out to be the 


case in many practical situations, some illustrations of which will be given later 
(see Sec. VII). 


3. The mean square error of ratio estimates for the specified subsampling 
system. The need for estimating a ratio from a sample arises in two cases; 
first, when the ratio is the population character for which an estimate is desired, 
and second, when the application of a ratio from the sample to a known total 
uses additional available information for obtaining an improved estimate of the 
desired total. 

Ratio estimates are desired as an end-result when, for example, the change in 
a characteristic from one time to another is being considered. Thus, if Y’ is 
the estimated total income of farm workers at one date, and X’ the corresponding 
estimated total income at a second date, then 7’ = X’/Y’ds an estimate of the 
relative change in the total income of farm workers over the period of time 
covered. Similarly, the estimate of a percentage such as the percentage of the 


’ Actually, a sufficient, although not necessary, condition for p to be equal to zero is that 
P:;/Px; be uncorrelated with both the ratio R,,; and the cross-product Ry:; Rag; for all 
pairs of substrata. 
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workers unemployed will involve the ratio of two random variables from the 
sample. Ratio estimates from a sample may be particularly useful in instances 
where the reliability of the ratio estimate is greater than the reliability of the 
estimate of either the numerator or the denominator, as is frequently the case. 

Ratio estimates may be used as a means of obtaining an estimated aggregate 
value of a specified characteristic, if Y, the aggregate value of a second charac- 
teristic highly correlated with X is known exactly from independent sources, and 


X’ and Y’, estimates of X and Y respectively, are available from the sample. 
Thus 


(11) X” = [(X’'/Y"]¥ =7r'Y 


is an estimate of the aggregate value of the specified characteristic. If the corre- 
lation, in successive samples, between X’ and Y’ is sufficiently high, the ratio 
estimate will be a more efficient estimate of X than will X’, the simple estimated 
total given earlier (7); but X’ will prove the more reliable estimate when the 
correlation is low. Thus, X’”’, when the correlation between X’ and Y’ is suffi- 
ciently high, makes use of more of the relevant available information for esti- 
mating X than does X’. 

The application of ratio estimates to the specified subsampling system is 
considered below. 

(a) The estimated ratio and its mean square error. The estimate of the popula- 
tion ratio r = X/Y is: 


(12) 


where X’ is given in (7) above, and Y’ is a similar estimate of the total value of 
a second characteristic. The mean square error of 7’ is approximately 


2 2X a P? Pi; Mniy —_m — Mnrj dX YV pase (Tass = Tri). 
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P 
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> x Pi(Fncay — 7)” =F  (Rajcays sy — Rnya:v) 


4 The variance of the ratio of random variables of the form r’ = X’/Y’ is approximately 
on = rt(Ver + Vy, — 2py’y’Vx’Vy’) where V indicates the coefficient of variation of the 
variable designated by the subscript, and py’y’ isthe correlation. Hence, if py’y’ is suffi- 
ciently large V2, will be less than Vx. The size of px’ y’ required depends on the relative 

magnitudes of ‘the coefficients of variation of X’ and Y’. 
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where 


Xniik = the aggregate value of a specified characteristic for the elements in 
the k-th subsampling unit within the h-7-j-th area, for which a total 
is to be estimated; 

Yanik = the aggregate value of a second specified characteristic for the ele- 
ments in the same subsampling unit, and for which the total in 
the population is known; 


Y nj = x Y nije ; and Yr = Zz mH Y rij ‘ 
: - 

x (V nage cor Vnsj) 
May 


Chii:y = is the variance of the sampling units in the h-7-j-th 
area with respect to the second characteristic, 
and Y hij == Ynsj/ M hij + 

Pri Vis is the adjusted average of the Yai; , and 

Pi Prij 

X hiik 


, 
V nije 


Rajcay:¥ oe i 


Thijk = etc., are the ratios of the X to the Y for the 


areas indicated by the subscripts, and 


Tria) = a are the ratios of the adjusted 


Bnjcay:y h(A):¥ ratios for X and Y indicated by 
the subscripts; 


Rrjca) 


Thi(a) = 


and the remaining symbols are as defined in the sections above where the ex- 
pected value and variance of X’ are given. 

The first and third terms of (13) are, ordinarily, the principal contributing 
terms. The second and fourth terms contain contributions due to the variation 
between the means of the substrata and the primary strata respectively even 
though the sample was stratified with respect to these classes. In some in- 
stances, the contributions of these terms will be important. The between 
strata contributions arise because the primary and subsampling units vary in 
size with respect to the character Y. 

This formula for the mean square error of a ratio is approximately equal to the 
one more commonly used given in footnote 4. The two formulas, both of which 
are approximations, would be identical if certain terms which are ordinarily 
negligible were retained in (13). This latter formula has the advantage of indi- 
cating the effect of different aspects of the design of the sample on the variance 
of the ratio. The derivation of this approximate variance formula is given in 
the Appendix, Section 3, together with an indication of the accuracy of the 
approximation. 

(b) The estimated totals and their mean square errors. As mentioned earlier, 
two estimates of X, the aggregate value of a given characteristic for all ele- 
ments are X’ (7), and X” (11). The mean square error of X’ is given by (9) 
and that of X” is simply equal to Y’o;. , where o;: is given approximately by (13). 
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The decision as to whether to use X’ or X” as an estimate of X depends, of 
course, in the first instance, on whether Y is known, and in the second instance, 
on the relative magnitudes of the respective mean square errors given in (9) 
and (13). These may be approximated from prior knowledge concerning the 
relationships in the population under investigation, or they may be estimated 
from preliminary sample investigations. However, in instances where there is 
a positive correlation between the Xi; ;, and the Y);; within substrata, it is fairly 
safe to assume that if the information necessary for the ratio estimate is avail- 
able, there will be little to lose and possibly considerable to gain from its use. 

The use of (11) instead of (7) is often desirable when Y in (11) is the aggregate 
value of the actual sizes of the primary units, and Y’ is its estimate. This is 
particularly so if the measures of size used are not fairly precise measures of the 
actual sizes, and if, at the same time, the actual size is highly correlated with 
the character being estimated, in which case the use of ratio estimates will yield 
gains in both the between primary unit contribution and the within primary unit 
variance. (See Sec. VII for numerical illustrations.) However, if the measures 
of size are identical with the actual sizes (i.e., Pai jx = Yrij.) the last two terms of 
(13) are identical with the between primary unit contribution to the variance of 
X' (9), and only the within primary unit variance is affected by the ratio estimate. 

While it is fairly safe in practice, if Y is known, to make use of X”’ instead of 
X’ as the estimate of X, some care must be exercised to make sure that the 
Xnijx has at least a moderately high average correlation with the Ya;j, where 
the correlations considered are those within substrata within primary sampling 
units. If this correlation is low, and if the size of the subsampling unit varies 
considerably, the ratio estimate may be considerably less efficient than the simple 
total estimate. On the other hand, if the measures of size of the various sub- 
strata and of the primary sampling units are fairly close measures of the actual 
size, and if the subsampling units have been carefully defined so that they do 


not vary too greatly in size, the two estimates are likely to have about the same 
efficiency. 


VI—SOME PHYSICAL PROPERTIES OF FREQUENTLY OCCURRING 
POPULATIONS , THAT ARE BASIC TO THE SAMPLING 
PRINCIPLES RECOMMENDED IN THIS PAPER 


Many actual populations are characterized by the following physical proper- 
ties: 
(i) The elements within a cluster are positively correlated with regard to a 
specified characteristic. 
(ii) Clusters containing large numbers of elements have greater internal hetero- 
geneity than clusters containing small numbers of elements. 
(iii) Increasing the size of the cluster brings in correlated elements (e.g., in popu- 
lation or agriculture surveys larger clusters are formed by including house- 
holds or farms in adjacent areas). 
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The first of these properties is recognized implicitly in the literature where the 
losses of efficiency through the use of large clusters as sampling units are fre- 
quently cited. In our experience the second and third properties hold just as 
commonly in actual populations, and ordinarily for the same populations for 
which the first property holds. 

The presence of these physical properties in combination within strata leads 
to the following mathematical relationships that have been used throughout 
this paper: 

(a) The sizes of the primary sampling units, N,;, are negatively correlated 
with the p,;, the intra-class correlations between elements within the 
units; 

(b) The N,;and N; jpn; are positively correlated ; 

(c) The N,; and oj; are positively correlated ; 

(d) The N;; and o;;/Ny; are negatively correlated. 

The use of these relationships has determined most of the choices among 
alternative procedures throughout this paper. The relationships, of course, do 
not necessarily hold, and exceptions to them can be found [5]. The frequent 
occurrence of populations characterized by such properties justifies further re- 
search on the more effective use of these and other properties that may be found 
to hold. 


VII—SOME APPLICATIONS OF THE PRINCIPLES DESCRIBED 
IN THIS PAPER TO AN ACTUAL SAMPLING PROBLEM 


The analyses summarized below were carried out for the purpose of deciding 
between alternative sampling procedures in the revision of a monthly national 
sample for labor force and other characteristics. Budgetary and administrative 
restrictions made it necessary to confine the field operations to a limited number 
of administrative centers scattered over the country, from which a sample of 
less than one-tenth of one percent of the population of the United States was 


to be drawn. 

The original sample (the one to be reviséd) was of a usual subsampling design 
in which counties were used as the primary sampling units, and households or 
small clusters of households were used as the subsampling units. In the revised 
sample contiguous counties were combined wherever administratively feasible, 
to form more heterogeneous primary units than the indiv‘dual counties. Ap- 
proximately 2000 primary sampling units were formed from the 3000 counties in 
the United States. The combinations of counties, the primary stratification, 
the area substratification, and the measures of size, were determined on the basis 
of 1940 Decennial Census data together with more recent data where available.’ 

The applications of the various principles suggested in this paper have been 


5 See [11] for a full description of the proposed revised sample, including an outline of the 
criteria of stratification used. That paper may be useful as a simple description of an 
application of the specified subsampling system. 
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evaluated by estimating 1930 Census labor force characteristics from a sample 
that was stratified on the basis of 1940 and more recent data. This constituted 
a particularly severe test of some of the methods, because of the substantial 
shifts that had taken place during the 10-year interval between 1930 and 1940. 

The analyses to be summarized in this section are concerned primarily with 
the gains obtainable under favorable circumstances by the introduction of three 
sampling principles; namely, 

(1) enlarged primary units (see Sec. IV-1); 

(2) the sampling of primary units with probability proportionate to measures - 

of their size (see Sec. IV-2); 

(3) area substratification (see Sec. IV-3). 

Some comparisons are also given to illustrate the effect of using alternative 
sample estimating formulas. Computations have been made for six of the prin- 
cipal items that are currently being included in a monthly report of the labor 
force; namely, total numbers of male and female workers, total numbers of male 
and female agricultural workers, and total numbers of male and female non- 
agricultural workers. The comparisons between alternative systems have been 
made holding constant both the primary stratification criteria and the expected 
numbers of persons to be drawn into the sample. 

The percentage gains given below are the reductions in the between primary 
unit contributions (which include the bias contributions) to the mean square 
error.” Except where otherwise specified, the sample estimate used is given 
by (7). 


















1. Gains obtained by introducing enlarged primary units. The gains obtained 
by using enlarged primary units are calculated by comparing the mean square 
errors arising from the sampling design in which individual counties are primary 
units with the mean square errors arising from the design in which combinations 
of counties are the primary units. In both designs, the primary units are drawn 
with equal probabilities and no area substratification is used. For this compari- 
son, preliminary computations have been completed for only a limited number 
of strata and for two of the labor force items given above; namely, total male 
workers and total female workers. The reduction in the sampling errors ob- 
tained by introducing enlarged primary units is estimated to be 48 per cent for 
total male workers and 26 per cent for total female workers. 









2. Further gains obtained by introducing probability proportionate to measures 
of size. The further gains obtained by using the principle of sampling with 
probability proportionate to measures of size are calculated by comparing the 
mean square errors arising from the design in which the units are drawn with 





* The contribution of the variance within the primary units to the total mean square error 
was relatively small in all instances, and practically unaffected by the introduction of the 
various principles. 
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equal probability with the mean square errors arising from the design in which 
the units are drawn with probability proportionate to measures of size. In 
both the designs, the primary units are combinations of counties, and in neither 
of them is area substratification used. The estimated per cent gains are as 
follows: 


Total Workers Agricultural Workers Nonagriculiural Workers 
Male Female Male Female Male Female 
50 8 77 6 19 21 


The gains reflect both decreases in the sampling variance and the elimination 
of the bias which arises when the primary units are drawn with equal prob- 
abilities.’ 


3. Further gains obtained by introducing area substratification. The further 
gains obtained by using the principle of area substratification are calculated by 
comparing the mean square errors for the design in which area substratification 
is not used, with those for the design in which area substratification is introduced. 
In both these designs the primary units are combinations of counties, and are 
drawn with probability of selection proportionate to measures of their sizes. 
The estimated per cent gains are as follows: 


Total Workers Agricultural Workers Nonagricultural Workers 
Male Female Male Female Male Female 
6 31 46 51 32 22 


4. Gains obtained by the integration of the above principles into a single sub- 
sampling system (the specified subsampling system). The gains obtained by 
using all three principles are calculated by comparing the mean square errors for 
the specified subsampling system (in which all three principles are used) with 
the mean square errors for the system in which none of these principles is used. 
In the specified subsampling system, combined counties are the primary units, 
the primary units are drawn with probability proportionate to measures of their 
size, and area substratification is used. In the other system, the primary units 
are individual counties, the sampling is done with equal probabilities and area 
substratification is not used. Preliminary computations for this comparison 
are available for only 2 of the 6 labor force items; namely, total male and total 
female workers. The estimated gains were 76 per cent for male workers and 53 
per cent for female workers. 


7 As indicated before, estimate (7) is used in both designs compared above. This esti- 
mate is unbiased for the design in which the primary units are drawn with probability pro- 
portionate to measures of size, but is biased for the design in which they are drawn with 
equal probabilities. However, for the latter design, the biased estimate is usually much 
more efficient than the best linear unbiased estimate. For the six labor force items, the 
best linear unbiased estimate gives rise to variances that are several times as large as the 
mean square errors for the biased estimate. 
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Calculations are available for all 6 items to measure the gains oktained by 
the use of the last two of the principles in combination; namely, probability 
proportionate to measures of size and area substratification. For measuring 
these gains, the systems are as described above, except that in both designs the 
primary units are combinations of counties. The estimated per cent gains are 
as follows: 


Total Workers Agricultural Workers Nonagricultural Workers 
Male Female Male Female Male Female 


54 37 88 54 45 39 


While both the specified subsampling system and the alternative to which it was 
just compared are biased designs, the bias in the specified system is appreciably 
smaller than the bias in the latter. For example, while the bias of the specified 
system in the estimation of total male workers was less than one-half per cent 
of the true total male workers, the bias for the alternative design in the estima- 
tion of the same population character was more than one and one-half per cent. 


5. The choice of estimate to use with the specified subsampling system. The 
simple estimate (7) given for the specified subsampling system may be improved 
on by the use of regression techniques (see Sec. III). However, such techniques 
may require a great deal of clerical work, so that they frequently cannot be used 
in practice. As indicated in the last part of Sec. V, however, if certain inde- 
pendent information such as a knowledge of the total population is available, a 
simple ratio estimate of the form of (12) may sometimes introduce gains over 
(7). The use of the ratio estimate may be particularly desirable when the 
correlation between the measures of size and the actual sizes of the primary 
sampling units is only moderately high, and when, at the same time, the actual 
sizes are highly correlated with the values for the character being estimated. 
A small-scale experiment in the sampling for labor force items indicated that for 
estimating total male workers for 1930, both the variance between primary units 
and the variance within primary units for the ratio estimate (12) were approxi- 
mately one-half that for the simple estimate (7). The use of the ratio estimate 
had very little effect in the estimation of the remaining five labor force character- 
istics. The reduction in variance of the total male employment figure was 
brought about because migration since 1930 reduced the correlation’ between 
the 1930 and 1940 sizes, and furthermore, the number of male workers is highly 
correlated with the total population. Similar reductions for the variances of 
the other five items were not obtained because the correlations with actual sizes 
for the other items were not as high. 


6. Some final remarks. The gains just obtained arose from application of 
the sampling principles enumerated above. The situations that these principles 
were applied to are favorable, but are frequently met in practice. The principles 
differ in their effect depending on the particular attributes of the population 
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being studied. The use of enlarged primary units may be desirable whenever 
the enlarged units are internally more heterogeneous than are the smaller units, 
The selection of primary units with probability proportionate to size is desirable 
for the general classes of populations described in Sec. VI whenever the primary 
units vary considerably in size. The use of area subsiratification is limited to 
sampling situations where large primary units are used. The joint effect of all 
three principles shows to greatest advantage when subsampling is used, the 
primary units are large, but variable in size, and the number of primary units 
included in the sample is limited by cost or administrative conditions. The 
types of estimates described in Sec. III may be effective in a large number of 
physical situations other than those mentioned in this paper. 
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APPENDIX 
1. The effect of the consolidation of the primary units on the sampling vari- 
ance (see Sec. IV-1). Let Xi = > > Xp/an, be the average for the sample 
7 k 


where the primary units are the original units and where X x is the value of the 
k-th element in the j-th primary unit; g is the number of primary units in the 
sample, and n is the number of elements sampled from each of the q primary 
units. The variance of Xj’ is 


(14) és ed I ee Set 


2 
(Q—1)¢”” 
where Q is the number of original primary units in the population; N is the 
number of elements in each original primary unit; civ = 22(X i — X;)°/QN 
is the variance within the original primary units, with X; = >> X/N; and 
k 


1 (W— 1)ng°” 


ow» = =(X; — X)*/Q is the variance between the original primary units, with 


xX _— >X,/Q. 
(15) o = D2(Xx — X)*/QN = oiw + ox. Then 
(16) ow = o[l + p(N — 1)/N 








s~- oe oO 
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2 
o Bow . . . 
where pi = E — Hy — | — is the intra-class correlation’ between elements in 
=» ite 


the original units. 
From (15) and (16) 








Ri 
(17) he = ST ol - a. 
Hence 
2 _N-no a Q-aqoa 
(18) == = N ng (1 pr) + @Q@- ie N [1 + pi(N 1)). 
Similarly, the variance of X3 is 

2 _CN-—no _ Q-@cC oa 

(19) o x2 ~ “Gn ng (1 pe) + @—Og¢ ON [1 + p2(CN 1)] 


where X3 is the mean for the enlarged primary units, p2 is the intra-class corre- 
lation between elements in the enlarged primary units and C is the number of 
original units combined to form each enlarged unit. Then 





2. & fq — 1)(C — 1) ss 

(20) o xX} ‘a, ir qN fe pe 1)(Q re C) +> pid pan 

,  @Q- gow -—1) W-s _ Q—Cg)(CN-1) _CN-n 
where a; = a and a —@- oo CC —, 
Since 


(C — Iq — )QN- 1), q — 1(C — 1) 

= 20 and —_-_ —_ 2 0, 
(Q-— 1)Q-C) Q-H)Q-C) 

then a gain is brought about by enlarging primary units whenever pi > p2, 

where p; and pe are both positive. 


ai — ae 


2. Comparison of variances of certain alternative subsampling systems where 
the primary units are of unequal sizes. The development of (4), the formula 
for the difference between the variances of sample estimates compared in Sec. 
IV-2 is given below. We shall confine ourselves to the simple case where only 
one primary sampling unit is drawn into the sample from each stratum. Let 


(21) X = =N,X3/N 
be the sample estimate used for each of the three designs to be compared, where 


— -_ 7h i * 
X, = X, j= + Xn j/Mnj, and Xp; is the value of the k-th element in the j-th 


k 








§ For definitions and properties of intra-class correlations, see Secs. 38-40 of Statistical 
Methods for Research Workers, R. A. Fisher, and [5]. 
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primary unit in the h-th stratum; L is the number of strata; m,; is the number 
of elements drawn into the sample from the j-th primary unit in the h-th stratum 


Qh 
with N),; the corresponding total number, N, = >> Na; with Q, = the number of 
i 


L 
primary units in the h-th stratum, and N = >) N,. If thesubsampling withina 
T 


stratum is of a constant proportion, C, as in the first of the subsampling systems 
mentioned, m,; in the above estimate is equal to C N;;. If the subsampling 
within a stratum is of a constant number, as in the second subsampling system 
mentioned, as well as in the recommended system, n,; is equal to 7, = C 7 


2 
Nii/Q, = CNy. 
We shall denote the sample estimate for the first design by Xj, that for the 
second design by X2, and that for the recommended design by X3. 
The expected values of the sample estimates for the first two designs, Xj , 
and X2, are the same, and are equal to 


Se «Bt =~ f =F Gey Hr gy an 
2 2 


where Xn; = >» Xnjz/Nn;. Thus, since X is not, in general, equal to > Xne3/DNaj 
k hit hai 


= X, both Xj and X; are biased estimates of X. 

For the recommended design, in which the primary unit is drawn with prob- 
ability of selection proportionate to size and a constant number taken from the 
sampled units within e stratum, the expected value of the sample estimate is 


(22) BX = 5 DN i ge oY = 


Nj Un 


and therefore the estimate for the recommended design is unbiased. 
The mean square error of Xi is 


. 1a w [ Nu - my 3 . x ‘| 
-7 —_—S —,, — qqEEgEe ° _ —_ 
(23) oz; N?2 2» Qn X (Nj 1); Chi X (Xp; ) 


- ‘ 1 = - 
+G@ - 27-3 x Ni(Xn — Xn)? 
where of; = >, (Xaw — Xnj)°/N pj 18 the variance between elements within the 
k 
j-th primary sampling unit of the h-th stratum, X, = a Xnj/Qn, Xnj = Xnj/Nnj; 
and X, = X x Xnan/ Le Ni; = X N wiXni/ De Nn;. The first term in the square 


bracket of (23) is the atdbalion of the vasinnes within primary units. The sec- 
ond term in the square bracket is an approximation to the mean square error be- 
tween primary units and the remaining terms give the error in this approximation. 
The mean square error of X: is given by the same formula but with ns; replaced 
by Th 
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. 2 2 . 
The difference between o°,, and o°;. is 


:. 1 Ni, h N a “(Gy - i 
(24) ~ ON? LG, oh Thj Ni — 1 Ni; Ni y 
which will be positive if oj ;/Nn; is scott correlated with N;;, as is almost 
invariably the case in practice (see Sec. VI). Thus, since ox ordinarily is 
larger than ox: , it will suffice to compare oz: with ox: to show that the recom- 
mended subsampling system is more efficient than either of the first two men- 
tioned. 


The variance for the recommended design is 


— Nnj Nnj — % — iy oni Nj - 1. 
@) x, = gD MA[ De ew Mh my — By 


For comparins the mean square error of X3 with the variance of X3 we shall 


define 


- fe z. 
Phi = al in = X;) = wet | 
as the intra-class correlation coefficient between elements within the j-th primary 
unit, where a is the variance between all elements within the h-th stratum. In 
this comparison, the terms outside the square brackets in (23), have been ig- 
nored because their contribution to the mean square error is either positive or 
negligible. Then, 


1 Nj Nn ony Nn\ , 2 Nii 
2 ~~ Pe, ae a ” — 1- rr.) : 1— = a 
(26) ox, a, = aR =Tadl~ ) tmz ml — Fy, 
The second term of this difference was given in Sec. IV-2 as the approximate 


difference, and the first term was neglected. To examine the relative magnitudes 
of the two terms we shall write 


N . 
(27) Vi, ~ | oni = o(1 — 6n;). 
= ? 
Then 


_1oM.2f1 (Xe ) (Xe - )) 
Oyen = mE get (My 1) Sony - 1). 


For the general class of populations given in Sec. VI the covariance between 
6,; and N,;, and also that between p,; and N,;, will be negative. Moreover, 
in many practical problems of this class the two covariances will be of approxi- 
mately the same magnitude. In such instances the first term of (27) will be 


equal to = times the second, and thus smaller than the second term for all 7, > 1, 
h 


and much smaller for moderately large values of %,. For example, in popula- 
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tions made up of clusters of different sizes for which the conditional probability 
of an element having a particular property for a fixed size of cluster is the same 
for all sizes of clusters, the two covariances will be very nearly equal. A number 
of practical problems approximate this situation. Moreover, even in the situa- 
tions where the covariance of 6,; and N,,; is several times that of p,; and N;;, 
say 5 times as large, then the second term will be larger than the first for all 
mm > 5. 

Some numerical illustrations of the gains obtained through the use of the 
recommended system are given in Sec. VII, and for some of the items for which 
results are summarized in that section the gains were substantial. 


3. The derivation of the variance formulas (13) and (9). The mean square 
error of a ratio of random variables is generally approximated from Taylor’s 
expansion. If X’ and Y’ are random variables, Y’ > 0, and r is the population 
character of which X’/Y’ = 7’ is an estimate, then 


(xr Yt y® xr ( ) (x ) 
26 f _ =F - 4 - 7i. 
oe Se ; ” ey Ge ) + \! — Gye ' 


The first term in the right-hand side of (29) is a first approximation to the mean 


square error from Taylor’s expansion, and the second term is the error in this 
approximation. 


Eq. (13), and as a special case (9), is derived as follows: 


(30) E(r’ —rY’ =E 


Let Wri = Vaije(rain — ot des le 3. Then, setting 


E EEF bu eee Yt ra)- 212-3 


(31) th “FT EY’ \Y’ 


Ee = EY"(r — r)?/(EY'Y 


is the first approximation to the mean square error. 
Since EY’ is evaluated in the same way as E.X’(8), it is merely necessary to 
y Wurs2 2 a Ty 
evaluate EY’*(r’ — r)*, the numerator of £6". Now 
a e Sh 1 mhij 
EY"(r — ry =E L DD De Waite 
h t 7 k 
hha th lt 
hea 


Sh Mhij 


1 Sh 
where y, = io ie 2d Vriik = ie Wii « 


t 
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, 
Since E >> Z Ve=E » » Wis/th + E X 2» Vri Wrr/th 
iver 


. 1 19 
(32) EY” (r’ - r) =E X é a Wri + E ea a2 Vri Vie + E 2 Vr Vo. 
"i 


nl, 4 ‘” 


The first term in the right-hand side of (32) is 


l 1 Pj Mri; Maz — My 
72. Phi Mnrij M nig — Muni 
33) E 2» y ht t h,i,7 ty P,, Mii Mis a X Whisk 
1 Pj Mnrij Mnrig —_ — 


+ h,isd ti, P, Maj May a 1 -< Wrijk) 
The second term of (32) is 


] / / 
(34) # Us pm Yruivnr = 
h th ur 
tyér 
where 
=) Whisk 5 
k 


and the third term of (32) is 


viv 1 Pj Mrij F 1 Piz mri 
5 = PR es. ‘i aia 2 2 éj 
(35) ERY, ‘=[Dt th P, Mii; vn | La =|D 7 P, Mi Vrs J. 


wie th i, tJ 


Therefore EY” (r’ — r)’ = (33) + (34) + (35), and when Yrigk(Trijsk — 71) is 
substituted for yai;, we have 


P,,; Mri Mi; — Mii 
Ey” SP an [oe ae a7 ie 
(7 — 9) = D3] De ee Me — Valo 


Prj Mnhij Mri — 1 
. x P, Mig Mn - tt ( Y niin (Train r) 


+2ee Pai a a 7 oy Mira =i r)P 


« = Paj mrss _ Y2.(rj — 1) 
47 P, Misi — 


” eo an Se _, (This =~_ Yui} | 


+[= 1 Pa ma Yatra — oh] 
i, th P, Mii hij\Ohij . 
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By substituting (Thi je — Pes ot is r)’ for (Thi jx = ry in the first term of 
(36) and P,:M);;/Pxijmai; for 1/t, in the 1st, 2nd, and 4th terms, the sum of these 
three terms becomes 


Pi; Maj Pis 
= F + r tik ~~ < 
hank Ph Mag Fr. - Yn uel rasie — Tos)” 
Pi; Mia; Pi; 
1s aS 
* o P, Mnij Pyj 
Pr; Mis; Pis ‘| 2 £4 
: Pj go © Vik — 
2B, P, Mnriz r. . (rs ) x - Miss 
where Pri; = (Maj = mnrij)/(Mni; _ 1) and Trig = » Xnsit/ DL Vase c 


When we substitute the appropriate value for 1/t, in the 3rd, 5th, and 6th 
terms of (36), the sum of these terms becomes 


P,,; Pri oo P _ Paj Pri = T 
2d P,, |= Prij V nis (Tai | x [> e P, Pry Vis (tri r) 


+ 4,2 


Py Pri 7 
+ Pe Pi Pris Y nas (rij nf. 


Py n . Xny _ Fos ) = na mci 
(39) » 7 V sj (Tri ” he Pri (Fa Pras r Pi(Rajcay — 1Rrjca):y) 


F hij Visine (1raie oP Tri) (rrsj i r) 


(38) 


Now 


= Px Rnicay:y (Faicay — 1) 
where 745.4) = Rajcay/Rajcay:y , and 
40) 4 a pe Ynij(Trs3 — 7) = a Pai(Rajcay — TRajcay:v) = Pr(Rncay — TRacay zy) 
= Pi Ra: (Fuay — 7) 
where fra) = Rrycay/Rua):y - 
Substituting (39) and (40) in (38), we have 
z (Pai/Pr) Ph Biscay: (Fria) — 7) — X Pi Ricay: Pca) — 1)” 


hi 


41 
(41) + [i Ps, Ricay:v Facay — ry)’. 


By substituting (7s j.4) — Facay + Poca) — 7) for (Frjcay — r)’ in the first term 
in (41) and expanding, (41) becomes 


2P. 
2d Py x Ria): y (Fajvay — Facay)” +2 > 1 Ph = Rita): v (Fricay — Frcay (Frcay — 7) 
(42) ” 


+ DP; n(Frcay — 1)” bs 1 Riw:r — Ria: | + [i Px Rica): (Frcay — 1) - 
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Hence, since (EY’)’ E@ = (37) + (42), 
2 o* —— oe. 2 
Pi; Mia; — Mn; » V nase (Thiik Thij) 
"P, May — 1 munis M naj Phas - 


» 2 Prj Mi im Mj X Visine (Tris = Trig) (Trij — 
re > oe tL Re. 
hi P, Mn — 1 mas; Mri P3,, 


2 
> p2. Pi3 Muy — nj 2 (Tai — 1) 
Ty ™ PM eT ns Ps 
ted Pi ay — | Mri Prij 


+ 2 Pi(Pri/Px)Rhicay:v (Faicay — Fyay)” 
02 


(EY'Y Ee = a Pi 


+2 D> PX(Pai/Px)Rhicay:x(Faicay — Fray) (Frcay — 7) 
02 
+ > Pi(?nay — 7)°(Pai/Px)(Rawo:y — Riw:v) 
? 


+ ( Pi Rya:r(Fray — 7)? 


Mhij Mhij 


where Oh jv — 2 (Vai jx = Yaris) [Mii and Yuuj = » Vai jx/Mni; — Ynij/Mni;- 


The aeeaeiiaaiinas to E(r’ — r)’ is given by (43) divided by (EY’). By ig- 
noring the 2nd, 5th, and 7th terms which are negligible for a large class of popu- 
lations, we obtain (13). 

The variance of’ X’ is derived from (43) by simply substituting P,;;/P for 
Yaijx in (43). This follows from the considerations given below: 

Since r’ = X’/Y’, and X’ is the numerator of 1’, ox: is given by o; when the 
denominator, Y’ , is identically equal to unity in sepeated samplings. 

1 My Pri 


Since — = = 


th mniz Prij Mrij com nae 


the denominator of r’ which is equal to 
1 mnij _ Pri 
iek mnij Prij 
when Yaijz is set equal to P,;;/P where P = =P). 
The formula for the mean square error of X’ (9), of course is exact since the 
error term 


Ynijx, Will be identically equal to unity in repeated sampling 


E{Y°/(EY')}{r — ry =0. 


It may be pointed out that ox, may be obtained directly and more simply 
without the use of (29) since X’ is not estimated from the ratio of random 
variables. 


From (29), the error term for the approximation to E(r’ — r)’, (43)/(EY’Y’, is 


72 
given by E{ 1 — item {r’ — r}*. This cannot be expressed as a simple func- 
(EY")? 
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tion of the individual observations, but useful maxima and minima for it may be 
obtained. A method for obtaining the upper and lower bounds of the variance 
of r’ is simply attained from the following inequalities which hold independent 
of the joint distribution of X’ and Y’. 
EY” EY” 
ye. (°’—r’ < Er’ -ry< yy. (r’ — ry 
where Yinax is the maximum value of the Y’ obtained simply by choosing or 
estimating the largest Y;, for each stratum. Yin (the minimum value of Y’) 
is obtained in a similar manner. 

Eq. 44 when evaluated turns out to be 


EY’)? Ee EY’)? Ee 
are < Er’ -ry’< Mop ; 


(44) 


(45) 


where (EY’)*E@ is given by (43). 

Eq. (45) will serve adequately as an indicator of the accuracy of £6’ for sam- 
pling systems in which the variability of the Y’s within strata is restricted. How- 
ever, in other designs, where stratification is not used and the variability in the 
Y’s is not restricted the limits given by (45) may be too broad to be useful. 
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MULTIPLE SAMPLING WITH CONSTANT PROBABILITY 


By Water BarTKY 


The University of Chicago 


1. Introduction. In an attempt to reduce inspection costs, manufacturers 
have frequently resorted to sampling procedure in which the disposition of an 
aggregate or lot of similar units does not necessarily depend upon the results 
of a single sample. In practice, however, the number of permissible additional 
samples is limited to one or two; nevertheless, if the lot is very large, an appre- 
ciable reduction in the expected sample may be accomplished by allowing a 
greater number of additional samples. In this article probability formulae will 
be derived for an inspection procedure for infinite lots in which the number of 
additional samples is not limited and may be any number depending upon the 
results of the sampling. This development will be limited to the simple case of 
attribute inspection in which the units fall into two categories—satisfactory 
units or defective units. If p denotes the fraction defective in an infinite lot, 
then the probability of finding exactly m defective units or defects in a sample 
of n is 


n m nu—m ie a 
(1) P(m, n) = (") pq , g=l—p. 


Since P(m, n) is the probability of m successes in n trials with constant probability 
of success p, though the terminology of commercial inspection will be used in 
this article, the results are applicable to other situations involving repeated trials 
with constant probability of success. 

In contrast with multiple sampling, a single sample inspection procedure for 
lots of the type here considered is one in which a lot of units is accepted or re- 
jected on the basis of the number of defective units found in the sample. Thus 
a lot is accepted if the number of defects is at most an integer c the “acceptance 
number,” and rejected if the number exceeds c. For an infinite lot containing a 
fraction p of defects and a sample of n units, the probability of accepting is by (1) 


(2) II. (c, n) = } Pim, n), 


msc 


and the probability for rejection is the difference between this sum and unity. 


2. Multiple sampling. The procedure in multiple sampling is to examine 
first an initial sample of mp units. If the number of defects in this initial sample 
is at most c the lot is accepted and if the number of defects exceeds c + k (k an 
integer) the lot is rejected. But if the number of defects is greater than c and 
less than c + k + 1 an additional sample is removed and examined. In the 
latter case similar criteria determine whether the lot is to be accepted or rejected 
or this method of sampling continued. With an infinite lot this scheme of samp- 
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ling has an infinite variety of forms but there are certain advantages in limiting 
this discussion to the following type of multiple sampling procedure. 

I. Sample Sizes: The initial sample is of m units but all additional samples 
are of the same size, namely n units. 

II. Condition for Acceptance: The lot is accepted if the number of defects in 
initial sample of 7m units is at most c or if after taking r additional samples of n 
the total number of defects in the m + rn units examined equals c + r. 

III. Condition for Rejection: The lot is rejected if the number of defects in 
initial sample of n» units exceeds c + k or if after taking r additional samples of 
n the total number of defects exceeds c + r + k. 

IV. Condition for an Additional Sample: An additional sample of n is taken 
only if neither condition II nor condition III is realized. 

Thus in this sampling scheme the level for acceptance as well as the level for 
rejection increases by unity for each additional sample of n. If at the r-th addi- 
tional sample a lot is neither accepted nor rejected then the total number of 
defects in initial plus additional samples must equal one of the k numbers 


etrtletrt+2,---,ctrt+k. 


Denote the probabilities for obtaining these numbers by 
(3) Py(r), Po(r), «++, Pe(r) 


respectively, the subscript indicating the number of defects in excess of the ac- 
ceptance level. 

_To be accepted on the (r + 1)-st additional sample, (a) no defect must be 
found in the (r + 1)-st additional sample and (b) a total of c + r + 1 defects 
must be found in previous samples. The probability of (a) is given by (1), 
taking m equal to zero, and the probability of (b) is the first one in the set (3). 
Consequently the probability of accepting a lot on the (r + 1)-st additional 
sample is 


Po(r + 1) = q"Pi(r). 
If 11 denotes the probability of eventually accepting the lot 
(4) IL = & Pim, m) + g"(Pi) + Pal) + Pi(2) + «+1, 


where the first term on the right is the probability of accepting on the initial 
sample and may be evaluated by means of (1). Furthermore 


(5) PO) = P(c + 1, no) 


and is by (1) the probability of finding c + 7 defects in initial sample. 

According to the notation (3) the probability of finding a total of c + r+ 1 
+ i defects in initial plus r + 1 additional samples, that is 7 more defects than 
the acceptance level, is Pi(r + 1). These probabilities may be expressed as 
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linear combinations of the probabilities (3) with coefficients that are probabili- 
ties of the type (1). Thus 


(6) Pir +1) = DY PG — 7 +1, n)Pi(r) 


where the sum may be made to extend for 7 = 1, 2, --- , k, provided one defines 
(1) as equal to zero for negative m. By repeated application of this linear trans- 
formation it is possible to express the probabilities (3) for additional samples in 
terms of the probabilities (5) for the initial sample. Thus if M denotes the k X k 
square matrix with elements 


(7) Mi; = Pi-—j+1,n) (@,j =1,---,k), 


by omitting subscripts and regarding P(r) as a vector with elements given by 
(3), the linear transformation may be written 


(8) P(r +1) = MP(r). 
Hence by repeated application of (8) 
(9) P(r) = M'P(0) (r = 0, 1, 2, --- ) 


provided the zero power of the matrix M is defined as the identity matrix J. 
The probability P;(r) cannot exceed the probability of finding exactly c + 
r+ i defects in a single sample of mo + rn units, that is, in the notation of (1), 
the probability P(e + r+ 7, + rn). Since the latter probabilities approach 
zero as r approaches infinity it follows that the limit of the elements of P(r) as r 
approaches infinity is zero. Thus with this multiple sampling procedure a lot 
is eventually either accepted or rejected. Furthermore since the matrix M con- 
tains no negative elements and P(0) may be chosen with all positive elements 
it follows that the elements of M’ approach zero as r approaches infinity or 
(10) lim M’ = “0”, the zero matrix. 
It can be demonstrated that since the limit (10) is the zero matrix the sum of 
the infinite geometrical series in the matrix M 


(11) I+M+M*+.---=(I-—M)", 


where the right member is the reciprocal of the matrix J — M. Consequently 
the infinite sum of vectors 


(12) V= : P(r) = (I — My" P(0). 


This infinite sum of vectors has elements Vi, V2, ---, Vx of which the first 
element is the sum in brackets occurring in the right member of (4). Hence the 
probability of eventually accepting the lot 


(13) II = x P(m, m) + @"Vi, 
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and is thus by (12) and (5) expressible in terms of probabilities for the initial 
sample, equations (1), and the reciprocal of the matrix J — M. 

In addition to the probability for acceptance one is also interested in the 
expected number, E, of additional samples. Since 


D Pir — 1) (r = 1, 2, 3, ---), 


where the sum extends over all 7 = 1, 2, --- , k is the probability of continuing 
to the r-th sample, it follows that 


2» Pie ~ 1) = x P,(r) 


is the probability that lot will be either accepted or rejected on the r-th sample. 
Therefore the expected number of additional samples 


E= Dd P(r — 1) — DY Pi(r)] 
sai 2. 2 P,(r), 
r20 ¢ 
or, on interchanging the order of summation and applying (12), 
(14) E=)¥:. 


That is, the expected number of additional samples equals the sum of the ele- 
ments of the vector V. 

Though it is possible to develop a general expression for the reciprocal matrix 
I — M, to determine the acceptance probability, IT, as well as the expected num- 
ber of additional samples it is only necessary to evaluate V. Now by (12) this 
vector is the solution of the linear system of equations 


(15) (I — M)V = P(Q). 


Though for k small this system could be solved directly, in order to find a form 
of the solution applicable for any value of k, let the expansion in power series in 
x of 


(16) (pr +9)” —z2’ =natortgrt+-::-, 


where the coefficients, g, are funetions of p and g. On clearing of fractions and 
equating coefficients of like powers of z it is found that 


(17) n=q” 


and, by equating the coefficients of the first k powers of x and using the nota- 
tion (7), 


a 0 (¢ = 1,2, ---,k — I), 
(18) Yi >, M9; = c. (Gi = k). 
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Similarly, if the expansion in power series of 
Dd P;(0)z' 
(px + q)"— 2 


where the sum is for alli = 1, --- , k, then by clearing of fractions and equating 
coefficients of like powers of z it is found that 


(20) hi = 0, 


and 


= chy = £7 Pi(0) baer 
(21) h; Be Mish; fee + hess (¢ = k). 


It follows from equations (18) and (21) that if 


(19) =h+ hex t+hsz* + -->, 


(22) Vi = gihess/gesa — hi (ij =1,---,k), 


then V, the vector with these elements, will satisfy equation (15). Since by (17) 
and (20) 


(23) Vi = q “hess/Qesr 
the probability for eventually accepting the lot is by (13) expressible as 


(24) Tl = Dd) P(m, no) + hess/gess ; 


msec 


while the expected number of additional samples is the sum of elements (22) 
of V; ‘ 

These results will now be summarized and simplified formulae derived for 
special cases. In the summary all probabilities are expressed by means of (5) 
in terms of the probabilities (1). 


3. Summary of multiple sampling formulas. For this multiple sampling 
procedure the initial sample is m) and the additional samples are n. A lot is 
accepted if on the 7r-th additional sample the total number of defects found is at 
most c + r and rejected if the total exceedsc + r. An infinite lot containing a 
fraction p of defects is either accepted or rejected, the probability of acceptance 
being given by 


= an x ("par + Aess/Ge+1 (qq =1->p), 


and the probability of rejection is 1 — II. The expected number of additional 
samples is 


(26) B= “> o- Dk, 
Qe+i i ‘ 
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where the sum extends over 7 = 1, 2,---,k. The g; and h; are the coefficients 
in power series of x in the expansions of: 


1 
(pz + gy — 2 


( N ) po ” ida x 
2 - 


(27) =natgmrtgxr+-::-, 


where the sum is for all 7 = 1, 2,---,k. These formulae apply to all finite 
values of c and k provided the binomial coefficient is zero for values of the argu- 
ment falling outside those occurring in the ordinary expansion of an integral 
power of a binomial. 


4. Computation of coefficients g and h. If the denominator in (27) is first 
expanded in power series in 


a(pr + q) * 


and then the resulting negative powers of binomials expanded in power seriesin 
x, itis found that 


(k — m)n+m— ') 


m 


x 7 inet : k xf 1. 
By (28) the coefficients h are expressible in terms of the g’s, 
hy — 0, 
(30) no ) : 7 
h 77 ‘ c+t_ no—c—t aii k 1. 
; ate (, +? P 4 oe . 
Other expressions for the coefficients may be derived from the theory of func- 


tions of a complex variable. Thus by Cauchy’s Integral Formula 


ena i de 
Oe Oey/ = 1 Jo (px +g) — a)’ 


— 1 [ S(x) dx 
MH Qa 1 Se (px + —)* — 2)’ 


(31) 
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where 


—- ™ cti_ no—ce—i_i 


and the closed path of integration C in the complex plane only includes the pole 
at the origin. Since the integrands are rational functions and the point at in- 
finity is not a singularity for either integrand, these integrals taken about the 
origin are equal to the negative sum of the corresponding integrals taken about 
the zeros of 


(px + q)" — 
If p ¥ nit can be demonstrated that there are n distinct zeros x1, 22, °++, In 
corresponding to the solutions of the algebraic equation 
(33) (pz, +g)” = 
One solution is obviously 
(34) % = 1, 


and for p = n “ this solution is a double root. 

The integrals about these zeros are obtainable from Cauchy’s Integral Formula 
and after integrating and simplifying the resulting sum by means of (33) it is 
found that for the case p ¥ n, 


1 4 pt, + q 
1- np s=2,--+,n ak**lq a (n -_ 1)pa]’ 
S(1) 1 (pr. + Q)S(as) _ 


na * 1° 
Mh La np | satel 8 [q — (n — I)pail 


ia = 


(35) 


If the power series (27) is multiplied by the series 
(l-—az)*=1l+a+e4+2°4+---, 
the resulting product 


1 
(1 — x) [(px + g)” — 2] 


so that, by Cauchy’s Integral Formula, 





=ntgatgart(gateotgs)a +-::, 


1 dx 
6) = Fem al 20 lee bo al 


Similarly the sum of the coefficients h that occur in the right member of (26) may 
be written 


1 S(x) dx 
-~ Ps Qo ~ eI [Fa — z)[(pz + q)" — a] 
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The integrals (36) and (37) are of the same type as (31), and by employing the 
same method of integrating used in deriving (35), the following expressions for 
the sums of coefficients g and h occurring in (26) are obtained: 


= siete ces 
Ge = = Tap 2 — mpi t 2 BO — alg — (m= Dpal? 
_ ys, _ kS(1) — Sl) _ n(n — 1)p'S(1) 
G8) Hem Dk Tap a ot 
5 PR TOS) 
a—n2,- ++ zi(1 = z)lq oa (n a 1)pz,] 
provided np ~ 1. Here S’(1) is the derivative of (32) with respect to x evaluated 
forx = 1. For the special case np = 1, two of the roots of (33) 
1 =— 2 = 1, 
and the integrals (36), (37) and (38) become respectively 
n+2,-—1 
(n — 1)ge41 = din + §n— $+ Do HQ ay? 
n+2,-1 
ae (1 — 2%) 
2 “it n+2-1 
(39) (n—1) Dg =Kn+ $kn + dyn — $k — ds —3n' +2 


#23 z(1 — ts)” ; 


(n—)1 De = (en + $kn + dyn — $k — ty — 4n")S(1) 


(n — 1)hey = (2kn + $n — $)S(1) — 2nS’(1) + dX S(z,), 


— (gn — $ + 2kn)S'(1) + 08") + D EF See, 

823 z,(1 Le) 
where the sum extends over all roots of (33) that are not equal to unity. Here 
S’(1) and S’’(1) are the first and second derivatives of (32) with respect to z 
forz = landp=n". 

Formulas (35), (38) and (39) require for their evaluation the solutions of equa- 
tion (33). For n greater than unity there are just two positive real solutions, 
say 7, = land 2z,. If nis even all other roots are complex numbers, while if n 
is odd they are complex with the exception of one negative real root. Conse- 
quently by (33) for s = 3, 4, --- , n the absolute values of the roots satisfy the 
inequality 


(p|z| +9)" > 2%, 
and consequently the | z, | cannot be between z, and z.. But equation (33) may 
be written 
(pz. +g)” — 1 
(pz. + q) — 1 


=! (s ¥ 1) 
p 
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so that for s = 2,3,---,n 


(40) X (px. + g)' = 1/p 


where the sum is taken for 7 = 0,1, --- ,m — 1 and therefore 
2X (p| | + 9)* >1/p (s = 3,4, +++ ,n). 


Now 2- is the only real and positive solution of (40), consequently, in order to 
satisfy the inequality, the absolute values of roots corresponding to s = 3, 


4, +--+,” must exceed z2. On combining this result with the former, it follows 
that 


(41) |a,|>1 and 2. 


Consequently for large values of k the most important terms in the right members 
of (35), (38) and (39) correspond to the real positive roots x; = 1 and 22 of equa- 
tion (33). By omitting the terms corresponding to s = 3, --- , n one can derive 
approximations to the g and h and their sums applicable for large k values. In 
fact for np near unity the roots corresponding to s = 3, 4, --- are considerably 
greater than unity as is illustrated in the following table of roots for the case 
np =1: 
n= 2, 1/2; Lo 
n= 3, 1/3; Xe —§; 

1/4; a —7+4+/-2; 

1/5; Le —12.2531 ---, 

—4.8734 +++ + 7.7343 +--+ +/—1 


and for s = 3, 4, 5, --- , | x. | is greater than 8. 
For very large values of n and small values of p one can find approximate 
values for the roots by solving the limit equation obtained from (33) by putting 


a= np 
and letting n approach infinity. This equation is 
(42) ee) = g,, 


where e¢ is the base of the natural logarithms. For the case a = 1, the roots are 
1, 1, 3.0891 --- + 7.4602 --- ~/—1, 3.66 --- + 13.88 --- ~/—1 and 


t= ee (b — /—1) + b\/—1 approximately, 


b = (2u + 1/2)z, u = 4, 5,6,---. 
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From equation (39) and these numerical results it follows that even with k as 
small as 3 the percentage error for the case np = 1 introduced in g by omitting 
the terms in the indicated sum is less than .002%. Consequently for all practi- 
cal purposes one may omit the complex and negative roots for values of k greater 
than 3 in computing the g’s for np in the neighborhood of unity. For smaller 
values of k the exact values of the g’s are readily obtainable from (29). 


5. Special cases. Consider first the case in which c < 0 and m Sk +c. 
With these conditions, under no circumstances could a lot be accepted or rejected 
on the initial sample and the indicated sum in the right member of (25) is zero. 
Furthermore for this case the sum (32) becomes 


(43) S(x) = (px + q)"x™. 
Consequently it follows from (33) that 

(44) S(z.) = x, 
where 

(45) t = n/n. 


It should be noted however, that for ¢ not an integer the right member of (44) is 
multiple valued and one must take that value for which 


(46) a, = (pr. + q)”. 


Thus for real positive values of x, , the right member of (44) is real. For integral 
values of ¢ there is of course no ambiguity in the notation. 

If (44) is substituted in the second equation of (35), the resulting expression 
for the h coefficient is of the same form as that for the g coefficient, in fact 


hea = Qe-t+e+1 
so that by (25) the probability for acceptance is for this case 
(47) TE = gr—t+e+1/Ge+ - 


In similar manner it follows from (43) and (46) that the sum of the h coeffi- 
cients, equation (38), 


Ai sa Gi-t+e +t 
and hence by (26) the expected number of additional samples 
(48) E = 1G, — Gi-tre — t. 


Since the initial sample is nt units and the additional samples are all equal to 
n units, the expected total number of units, sampled, that is, initial plus addi- 
tional samples is 


(49) [= No + nE = n(IIG, - Ge_e+)- 
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Since for this case it is impossible to accept or reject on the initial sample one 
could combine the initial sample with the first additional sample. In fact one 
can continue combining initial and additional samples and thus increasing c and t 
provided the new initial sample m and the new c value thus obtained are such 
that 


(50) c £0, m=ntsSk+n—-—l1t+e. 


In this process of combining samples ¢ and c increase at the same rate and conse- 
quently formula (47), and the right member of (49) are unchanged. In other 
words formulas (47) and (49) may also be used under conditions (50). 

It was demonstrated in Section 3 that for k sufficiently large one can omit 
those terms in (35) and (38) corresponding to complex or negative roots of (33). 
If this is done the following useful approximations for the g and G are obtained: 


ge = (1 — np) + [gq — (n — 1)pay a ™, 

(51) G, = k(1 — np) — 3n(n — 1)p'(1 — np)” 
+ [Ig — (mn — Ipay*( — 2) Oo, 

provided np ¥ 1, k ¥ 1 and z is the real positive root of 
(52) (px + q)" = 2 (np ¥ 1) 
that is not equal to unity. For np = 1 these approximations become by (39) 

(n — 1)gx 2kn + 2n/3 — 4/3 

(n — 1)G, = kn + 5kn/3 + n/18 — 44/8 — 1/18 —n"/9, k#¥ 1. 


These formulae in conjunction with formulae (47) and (49) give quite satis- 
factory approximations for the probability for acceptance II and the expected 
total number of units sampled even when values of the subscripts employed are 
as small as 3. Of course the larger the value of k in (51), (52) or (53) the better 
these approximations. 

Now the root x of (52) is greater or less than unity depending on whether the 
product a = np is less than or greater than unity. Consequently it follows from 
(47) and (51) that for c = 0 and ¢ finite 


Il’ = lim = lim Jx—t4+1/Qe+1 


ko 
(54) =1, np<1l; 
=2', np>1; 


while by (49) and (51) the expected total number of units sampled has the 
limiting value 


x np <1; 
(55) \ np > 1. 
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But & infinite implies that under no circumstance can a lot be rejected. Conse- 
quently II’ and J’ are the exact values of the probability for acceptance and the 
expected total sample respectively for the following sampling procedure: 

The initial sample is nm) = nt and all additional samples are n. The lot is 
accepted if on the initial sample no defects are found or if after taking r addi- 
tional samples a total of exactly r defects is found. 

In inspection problems p is usually small and n large so that the approxima- 
tion (40) may be used to determine the real positive root x, thus 


(56) or" 2 (a = np). 
It then follows from (54) and (55) that for np > 1 


(57) 


These relations are of course equivalent to (54) and (56). Suppose that the 
probability II’ and the fraction p are assigned. Then the initial sample m, 
and additional sample n, will depend on only the parameter x. Consider next 
the problem of sampling a number of lots that fall into two categories, namely 
those containing a fraction p of defects and those containing a fraction p* of 
defects where p* < p. If in addition the sampling procedure is to be such that 
lots with fraction p* of defects are eventually accepted, but lots with fraction p 
of defects have a small assigned probability of acceptance II’, then whatever 
the value of x as long as the resulting np 2 1 these conditions are satisfied. 
Furthermore if one insists that the expected total sample for lots containing a 
fraction p*, namely by (55) 


I'(p*) = no(l — np*)”, 
be a minimum, then it is found that 
(58) z= p*/p. 


This remarkably simple result is capable of still greater generalization. By an 
altogether different approach to the problem the author has succeeded in proving 
that of all possible multiple sampling procedures, the multiple sampling method 
here described and defined by equations (57) and (58) gives the minimum 
expected inspection for the problem under consideration provided n is sufficiently 
large.’ 

By letting both —c and k approach infinity it is possible to derive probability 
formulae for sampling procedure in which a lot is either rejected or the sampling 
continues without end. These formulae are included in Table I along with 
other special cases derived from previously listed general formulae. 


1 Note: The author has postponed publication of this proof in the hope that it might 
be generalized to include sampling problems involving both acceptance and rejection of 
a lot. 
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TABLE I 
Notation: 

number of units in each additional sample 

number of units in initial sample 

fraction defective in lot 

np 

;-,— 

maximum number of defects in initial sample for acceptance 

no/n = ratio initial sample to additional samples 

= c+ k-+1 = minimum number of defects in initial sample for 

rejection 

number of defects in initial plus first r additional samples for 
acceptance 

e+k+1+ +r = minimum number of defects in initial plus 
first r additional samples for rejection 

= probability of eventually accepting lot with fraction p defects 

probability of eventually rejecting lot with fraction p defects 

expected total number of units sampled (i.e., initial plus what- 
ever additional samples are sampled). 

real positive root different from unity of the equation 

(px + q)” = =. 


Conditions II 


k=1 1 
(a)c =0 | gt x La =n.) pat 
= 2 


1 — npg" 





q’(1 — npg” *)* n(1 — npg™')™ 


er 


/Qr+r no + nq”? "Gi/ esr 


” iin 


(1 — npq”™")* mo + ng"*(1 — npg” "*)”* 





Pernice cement 


eo [Ny on 
n(n + 1) 22 ona ___ng’(1 + gq" — npg”) 


2 : — 2npqr + n(n - = + 1) por? 





1 _ 2npq" + 





TABLE I—Conchiuded 


Conditions 0 I 
k 0 fornp>1 in +nz(1—2z)" fornp>1 


qg’’ "(1 — np) for np < 1* oo for np <1 


es 


1 m(21I — 1) 
1 + (p/q)”° q-p 


Ox / Ges n(TIG, — Gx_1) 


(j) c= 1 (np < 1) m(1 — np) (np < 1) 
Vk =o z™!™ (np > 1) 2 (np > 1) 
*In this sampling procedure a lot cannot be accepted so that II is the probability 


that additional samples will be taken without end. The probability of rejecting lot 
is however 1 — II. 


TABLE II 
Values of g and G for Limitn = ~,p =0 


0.2558 | 0.4024 0.6931 ‘ | 1.3863 2.0118 2.5584 


7.477} 12.915 
10.455| 40.86 | 133.76 
23.48 | 208.2 |1343.2 
49.55 |1045. 13.4 X 10° 

| 101.70 \5228. | 134 x 10° 


oo 





2.000 | 2.718 | 7.477| 12.915 
4.614 | 7.389 | 14.45] 48.34 | 146.7 
.795 7.549 | 14.05 | 37.93 | 256.5 [1490 

.467 | 10.65 | 22.72 | 87.5 /1301. 14.9 x 10° 
-140 | 13.82 | 33.39 | 189.2 |6529. 149 x 10° 


| 
| 
' 
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As an illustration of the method of application of these formulae, suppose that 
the sampling procedure is to be such that the probability, Il, of accepting a 
“p” value of 0.5 + € equals the probability of rejecting a ‘“‘p” value of 0.5 — «. 
This condition on probabilities is by Table I, formula (g), always satisfied if 
c=0,n = 2,andm =k+1. This corresponds to a multiple sampling scheme 
in which additional samples are only two units each and a lot is accepted or 
rejected on initial sample if none or all units are defective. With « = 0.1 and 
II < 1/6, one can take m = 4andk = 3. The expected total number of units 
examined depends on “‘p” and varies for this numerical case from 4, for p = 0 
or 1, to a maximum of 16, for p = 0.5. Nevertheless a single sample plan 
satisfying the same conditions would require a sample of 23 units whatever the 
value of p. 

The previous problem is, however, not typical of those encountered in com- 
mercial inspection for in such situations p is usually very small. In practice 
one can generally replace the formulae in Table I by their limiting values for 
n= ©,p=0,andnp=a. Table II gives the limiting values of the g and G 
as well as x for a small number of values of a. 

Finally the justification for multiple sampling lies in the fact that a reduction 
in the expected total sample is possible. Though this paper is limited to the con- 
sideration of a very elementary type of sampling, it indicates that it might be 
worth while to investigate the possibility of utilizing the methods of multiple 
sampling in inspection for variables. Unfortunately serious mathematical 
difficulties are even encountered in so simple a problem as multiple sampling 
from a normal population for the mean. 





AN EXACT TEST FOR RANDOMNESS IN THE NON-PARAMETRIC CASE 
BASED ON SERIAL CORRELATION’ 


By A. WatLp AnD J. WoLFow!ITz 


Columbia University 


1. Introduction. A sequence of variates 2; , +--+ , Zw is said to be a random 
series, or to satisfy the condition of randomness, if x, , --- , xy are independently 
distributed with the same distribution; i.e., if the joint cumulative distribution 
function (c.d.f.) of 21, +--+, tw is given by the product F(x) --- F(aw) where 
F(x) may be any c.d_f. 

The problem of testing randomness arises frequently in quality control of 
manufactured products. Suppose that z in some quality character of a product 
and that 2, t%2,---:, 2w are the values of x for N consecutive units of the 
product arranged in some order (usually in the order they were produced). The 
production process is said to be in a state of statistical control if the sequence 
(x1, --- , tw) satisfies the condition of randomness. A number of tests of ran- 
domness have been devised for purposes of quality control, all having the fol- 


lowing features in common: 1) They are based on runs in the sequence 11, ---, 


xy. 2) The test procedure is invariant under topologic transformation of the 

x-axis, i.e., the test procedure leads to the same result if the original variates 
/ , / . 

21, °°, tw are replaced by 21, --+ , Yy where x = f(x.) and f(t) is any con- 

tinuous and strictly monotonic function of t. 3) The size of the critical region, 


i.e., the probability of rejecting the hypothesis of randomness when it is true, 
does not depend on the common c.d.f. F(x) of the variates 71, --+,2y. Con- 
dition (3) is a fortiori fulfilled if condition (2) is satisfied and if F(x) is continuous. 
The fulfillment of condition (8) is very desirable, since in many practical appli- 
cations the form of the c.d.f. F(x) is unknown. 

Tests of randomness are of importance also in the analysis of time series (par- 
ticularly of economic time series) where they are frequently based on the so- 
called serial correlation. The serial correlation coefficient with lag h is defined 
by the expression’ (see, for instance, Anderson [1]) 


N N s 
oa. >. 
>. fe tare 2 2.) / N 


. Oo" En (Es) |e 


where Xp+2 is to be replaced by 2,42-w for all values of a for which h + a > N. 
The distribution of R, has recently been studied by R. L. Anderson [1], T. 
Koopmans [2], L. C. Young [3], J. v. Neumann [4, 5], B. I. Hart and J. v. Neu- 


1 Presented to the Institute of Mathematical Statistics and the American Mathematical 
Society at a joint meeting at New Brunswick, New Jersey, on September 13, 1943. 
2 Some authors (see, for instance, [2] p. 27, equation (61)) use a non-circular definition 
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mann [6], and J. D. Williams [7], under the assumption that 1, ---, zw are 
independently distributed with the same normal distribution. Thus, in addition 
to the randomness of the series (21, --- , xy) it is assumed that the common 
ce.d.f. of the variates 1,,---, xy is normal. This is a restrictive assumption 
since frequently the form of the common c.d.f. F(x) of the variates x1, --- , tw 
is unknown. 

The purpose of this paper is to develop a test procedure based on R, such that 
(a) if F(x) is continuous the size of the critical region does not depend on the 
common c.d.f. F(x) of the variates z;, --- , ty, thus making an exact test of 
significance possible also when nothing is known about F(x) except its continuity; 
(b) if F(x) is not continuous, but all its moments are finite and its variance is 
positive, the size of the critical region approaches, as N —.», the value it would 
have if F(x) were continuous. Thus in the limit an exact test is possible in this 
@aseas well. We will refer to the case where the form of F(x) is unknown as the 
non-parametric case, in contrast to the case when it is known that F(z) is a 
member of a finite parameter family of c.d.f.’s. 

The test based on the serial correlation seems to be suitable if the alternative 
to randomness is the existence of a trend’ or of some regular cyclical movement in 
the data. In the analysis of time series it is frequently assumed that this is the 
case and this is perhaps the reason why tests based on serial correlation are 
widely used in the analysis of time series. In quality control of manufactured 
products the existence of a trend is often considered as the alternative to random- 
ness, caused perhaps by the steady deterioration of a machine in the production 
process. Thus, tests of randomness based on serial correlation could also be 
used in quality control. 


2. An exact test procedure based on R,. Let a, be the observed value of 
ta(a = 1,---,N). Consider the subpopulation where the set (21, --- , ty) is 
restricted to permutations of a,,---, ay. In this subpopulation the proba- 
bility that (a. , --- , zw) is any particular permutation (a; , --- , ay) of (a, ---, 
ay) is equal to 1/N! if the hypothesis to be tested, i.e., that of randomness, is 
true. (If two of the a; (¢ = 1, 2, --- , N) are identical we assume that some dis- 
tinguishing index is attached to each so that they can then be regarded as distinct 
and so that there still are NV! permutations of the elements a; , --- , dy.) 

The probability distribution of R, in this subpopulation can be determined as 
follows: Consider the set of N! values of R, which are obtained by substituting 
for (x1, -++ , Zw) all possible permutations of (a;,---, ay). (A value which 
occurs more than once is counted as many times as it occurs.) Each of these 
values of R, has the probability 1/N!. On the basis of this distribution of Ra 
an exact test of significance can be carried out. Suppose that a is the level of 
significance, i.e., the size of the critical region. We choose as critical region a 
subset of M values out of the set of N! values of Ra where M/N! = a. The sub- 


3 If the existence of a trend is feared it may be preferable to use the non-circular statistic 
discussed, for example, in [2]. 
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set of M values which constitute the critical region will depend in each particular 
problem on the possible alternatives to randomness. For example, if a linear 
trend is the only possible alternative to randomness, then the critical region will 
consist of the M largest values‘ of R,. The value of the lag h will also be chosen 
on the basis of the alternatives under consideration. For instance, if some 
cyclical movement in the data is suspected the choice of h will depend on the 
form of these cycles. The general idea underlying the choice of the subset of M 
values and of the lag is to make the power of the test with respect to the alterna- 
tives which are particularly feared as high as possible. 

If R, has the same value for several permutations of (a; , --- , dy), it may be 
impossible to have a critical region consisting of exactly M values of R,. For 
example, if a; = a2 = --+ = ay, then all the N! values of R, are equal, and the 
number of values of 2; included in the critical region must be either 0 or N!. If 
F(x) is continuous the probability that two values of R, be equal is zero. This 
explains why an exact test is always possible when F(x) is continuous. On the 
other hand, if F(x) is not continuous, the probability that several values of R, 
be equal is positive. However, the theorem we shall prove in Section 4 shows 
that in the limit an exact test is possible even when F(x) is not continuous, but 
has finite moments and a positive variance. For if the latter is true, the 
probability is one that the weaker. conditions for the validity of our theorem 
(given at the end of Section 4) will be fulfilled. 

Consider the statistic 


. 
(2) R, = y La Lh+a 
a=l 


where 244 1S to be replaced by 2;,+2—n for all values of a for which h + a > N. 
Since in the subpopulation under consideration >-%_, x. and >>%_, x% are con- 
stants, the statistic R, is a linear function of R; in this subpopulation. Hence, 
the test based on R, is equivalent to the test based on R,. Since R, is simpler 
than R; , in what follows we shall restrict ourselves to the statistic R, . 

We shall now show that, if h is prime to N, the totality 7; of the N! values 
taken by R, is the same as 7, the totality of the N! values taken by R, . 

In the argument which follows it is to be understood that, whenever a positive 
integer is greater than N, it is to be replaced by that positive integer less than or 
equal to N which differs from it by an integral multiple of N. 

Clearly it will be sufficient to show the existence of a permutation p: , po, -°: 
pw Of the first N integers such that 


Pit 1 = Distr (¢ = 1,2,---,N). 
Such a permutation is given by 


? 


J = Do-yay G = 1,2,---,N). 
For if 7 ¥ 7’ then (j — 1)h +1 #¥ (jy — 1)h + 1 because his prime to N. Hence 
to every positive integer 7 there is a unique positive integer j, (7, 7 < N) such 





4 See footnote 3. 
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that 


~=(—-—1)h+1 
Now 


Pit 1 = Popa F1=7 +1 = Dig = Disr, 


which is the required result. 

In what follows we shall restrict ourselves to the case when h is prime to N. 
This is not a very restrictive assumption since in practice h will be small as com- 
pared with N and by omitting a few observations we can always make N prime 
to h. Since T; is the same as 7, we shall deal with the statistic R, only. To 
simplify the notation we shall write R instead of R,. Thus, the test procedure 
will be based on the statistic 

y-1 


(4) R= 2» La La+1 + Inwn. 


If N is very small an exact test of significance can be carried out by actually 
calculating the N! possible values of R. However, this procedure is practically 
impossible if N is not small. In Section 3 the exact mean value and variance of 
R will be calculated, and in section 4 the normality of the limiting distribution 
of R will be proved. Thus, if N is sufficiently large so that the limiting distribu- 
tion of R can be used, a test of significance can easily be carried out. Difficulties 
in carrying out the test arise if N is neither sufficiently small to make the computa- 
tion of the N! values of R practically possible, nor sufficiently large to permit the 
use of the limiting distribution. In such cases it may be helpful to determine 
the third end fourth, and perhaps higher, moments of R, on the basis of which 
upper and lower limits for the cumulative distribution of R can be derived. 
(For a description of the Tchebycheff inequalities by which this can be done see, 
for example, Uspensky, [8], pp. 373-380.) Since the limiting distribution is 
normal it may be useful to approximate the distribution by a Gram-Charlier 
series or to employ similar methods. 


3. Mean value and variance of R.° It is clear that 


E(R) = NE(a2) = eiltoras > DX aa as 
(5) N(N — 1) “a 
= lat +++ tay) Gi +++ +a). 
To calculate the variance of R we first calculate the second moment of R about 
the origin. We have 


(6) E(R’) = E (x22 + +++ + tyitw t+ ty)” 
= NExix} + 2NExzxix; + (N? — 3N)Exy22tets « 


‘ The first four moments of a similar statistic have been obtained by Young [3]. 
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To express the expected values Exjz2 , Exyx3x; , and Ex,r2rgr, we shall introduce 
the following notations for the symmetric functions of a,, --- , ay: For any 
set of positive integers 7:, 72, ---, 7% the symbol S;,;,...;, denotes the sym- 
metric function >), -*: Dv«, 2s, --: as, where the summation is to be taken 
over all possible sets of k positive integers a; , --- , a, subject to the restriction 
that a, < N and a, ¥ a (u,v = 1,---,k). 

From (6) we easily obtain 


— N 2N 
: BR) = yD S* + wwe S™ 
N* —3N 
(7) + xv Dw jw &™ 
Ses 2Si01 Sun 


“W-n’W-nw-d*W-D)W—2)’ 


It will probably facilitate computation to express each of the symmetric func- 
tions in the right member of (7) by a sum of terms, each a product of factors 
S,(r = 1,2, ---). One can easily verify the relationships 


(8) Su = Si — & 
(9) Si = Sn = SS: — Ss 
(10) Sis = Su = SS; — Ss 
(11) See = Si — S 
\ (12) Sin = SuS, — 282 = (Si — S2)Si — 2(S:S: — Ss) 
= Si — 38,8: + 28; 
413) Sua = Sim = Sm = SuSs — 28x 
= (Si — S2)S, — 2(S:S3 — Sx) 
= SiS, — S; — 28,83 + 28, 
(14) YSun = SS: — 38ue “ 
= & — 3975, + 2515, — 38s, + 3S? + 65,5) — 6S, 
Si — 6815S. + 89,53 + 383 — 65. 
It follows from (5) that 


+ 


1 
N-1 
and from (7), (11), (13), (14), and (15) that the variance of R is given by 

o(R) = E(R’) — [E(R)P 
(16) _Si— Si, Si— 48S. +45:S;+ 83-28% 1 


71" (N — 1)(W — 2) (NW — 1: S: — ©) 


(15) E(R) = (Si — S:), 
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The mean value and variance of R can easily be computed from (15) and (16) 
as soon as the values of S; , Sz , Ss; , and S, have been determined. 


The formulas (15) and (16) are considerably simplified if S; = 0. In the 
special case that S,; = 0 we have 


(15’) E(R) = — wo 


and 


, Si — S S2 — 28; S: 
0 @)= Watt WW” WO 


We can always make S, equal to zero by replacing a, by ba = G2 — NZ aa. 
This substitution is permissible, since it changes the statistic R only by an addi- 
tive constant and consequently leaves the test procedure unaffected. Thus, in 


practical applications it may be convenient to replace a. by b. and to use formu- 
las (15’) and (16’). 


4. Limiting distribution of R. Let {aa} (a = 1, 2, --- ad inf.) be a sequence 
of real numbers with the following properties: 
a) There exists a sequence of numbers A;, Az, --:, A,r, -:+ such that 


N 
(17) = 7 < A, (r = 1, 2, — 
N a=1 


for all N. (This condition means that the moments about the origin of the 
sequence @; , d2, --+ , dy are bounded functions of N.) 


b) If 
8(N) = Pe _ (de) |, 
then 


(18) lim inf 6(N) > 0. 
N 


- ad inf.) 


(This condition means that the dispersion of the N values a;, a, -°--, @w iS 
eventually bounded below.) 
Let R(N) be the serial correlation coefficient R as defined in (4), where 11, --- 
ty is a random permutation of a; , d2, +--+ ,@y. Weshall prove the following 
THEOREM: As N — ~, the probability that 


R(N) — E(RW)) 


RN) ~~. 


approaches the limit 


—}22 
Ti 2r [ie e., 
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For any function f(N) and any positive function ¢(N) let 
f(N) = O@(W)) 
mean that | f(N)/(N) | is bounded from above for all N, and let 
f(N) = 2@()) 
mean that 
f(N) = O@W)) 
and that lim inf | f(N)/o(N) | > 0. Also let 


S(N) = o(e(N)) 
mean that 
. f(N) _ 
man ** 
Let [p] denote the largest integer less than or equal to p. 


To simplify the proof we shall temporarily assume: 
c) There exists a positive constant K such that, for every positive integral N, 


(19) —-K<S= Dia<K. 
a=l 


This restriction will be removed later. 
Lemma I: 
e «60 Ge, Qa, *** da, = o(n*!), 


ai<+++<aR 

PRooF: >) --: >) Ge, °*: Gq, can be written as the sum of a finite 

aise ag 

number of terms where each term is a product of factors S, (r = 1, 2, ---). 
This representation will be called the normal representation of > --- Do da, °°: 
Ga,. Since S; = O(1) by (19) and S, = O(N) by (17) and since the number of 
factors S, (r > 1) in a single term of the normal representation of > --- >) de, 
*++ @q, is at most [3k], the equation > -:- 2 thes on O(N™!) must 
hold. 

Lemma 2: Let y = 2 +--+ 242, wherez = x41 --+ 2p4,andi;>1(j = 1, ---,7). 
If (a1, +++ , tw) ts a random permutation of a1, +++, an, and if k, r,t, +++, 
are fixed values independent of N, then E(y) = O(N™!-*), 

Proor: Let E(y | te41, +++ , Ler) be the conditional expected value of y when 
Lk4i,°** » Tey, are fixed. It follows easily from Lemma 1 that 


E(y | tes1, +++, Tete) = O(N), 
Hence also E(y) = O(N!) and Lemma 2 is proved. 
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Denote Ya%a41 by Yala = 1, ---, N — 1) and xyx, by yw, and consider the 
expansion of (y,; + --- + yw)’. Let y be a term of this expansion, ie., y = 
N 


it yet “+e ys, (a1 < a2 < +--+ < ay). We will say that two factors y. 
ue 


ss 
and yg are neighbors if |a -8+1|or|a—8—1]|iseitherOorN. The set of 
u factors Ya,,°°*, Ya, can be subdivided into cycles as follows: The first cycle 
contains y., and all those y. which can be reached from ye, by a succession of 
neighboring yz. The second cycle contains the first y. of the remaining se- 
quence and all those which can be reached from the first y. by a succession of 
neighboring y.. The third cycle is similarly constructed from the remaining 
sequence, etc. After a finite number of cycles have been withdrawn the sequence 
will be exhausted. If m is the number of such cycles we will say that y has m 
cycles. 

LremMa 3: Let y be a term of the expansion (x11. + +--+ + ty%)" = (Yi t-::: 
+ yn)’ (r fixed). Let m be the number of cycles in y and k be the number of linear 
factors in y if y is written as a function of m1, +++, tw (i.e., if we replace ya by 
Lala41)- Then the maximum value of m + [}k] — k is equal to [}r]. 

Proor: First we maximize m + [3k] — k with respect. to k when m is fixed. 
If m < [4r], then the minimum value of k is obviously zero. Let m = [4r] + 7’ 
(r’ > 0). The minimum value of k is reached if each cycle consists of a single 
factor y. and if each factor ¥. in y is either linear or squared. If ris even, then 
the minimum value of k is 4r’ and if r is odd then the minimum value of k is 
4r’ — 2. Hence for m =[4r] + 7’ we have 


max (m + [3k] — k) = [Ar] —r’ if ris even 
and 
= [$r] — r’ + 1 if ris odd. 


Hence maximizing with respect to m and k we obtain 


max (m + [3k] — k) = [fr], 
and Lemma 3 is proved. 

Lemma 4: The expected value of the sum of all those terms in the expansion of 
(ayt2 + --+ + ty)" for which m is the number of cycles and k the number of linear 
factors (if y is expressed in terms of x, , --~ , Zw) is equal to O(N™*!*), 

This Lemma follows from Lemma 2 and the fact that the number of terms y 
with the required properties is O(N”). 

Lemma 5: 


E(aye + +++ + twas)’ = O(N™), 
This follows from Lemmas 3 and 4. 
Lemma 6: If r is even then 


Elesza + +++ + ann)’ = (Ch (Z) Begiat --- 2d) + ov". 
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Proor: It follows easily from our considerations in proving Lemma 3 that 
m + [3k] — k < ir for all terms in the expansion of (x72 + --+ + 2ya1)" which 
are not of the type z} --- z;. Hence it follows from Lemma 4 that the expected 
value of the sum of all those terms in the expansion of [xr + --- + 2ya]’ 
which are not of the type xj --- x; is equal to o(N’”). Lemma 6 follows from the 
fact that 2~’’r! is the coefficient of the terms of the type zi - -- x7 in the expansion 
of (x72 + --- + 2ya1)’ and that the number of terms of such type is equal to 
i E(aite + +++ + awa)’ 

ai M12 +++ + Iw i = : 

Lema 7. Lim Ge ae = Oif ris odd and = 2" r!/(4r)! if 
r is even. 

ProoFr: From Lemma 6 it follows that 


(20) E(ate + +++ + ya)? = NE(zxix3) + o(N) = Q(N). 
The first half of Lemma 7 follows from Lemma 5 and equation (20). Ifr 
is even then it follows from (20) that 
E(a,22 + +++ + tym)’ . 2* Chr! E(aj +++ 2) 
ne lim we OS 
{E(ai1t2 + +++ + 2y%)°} w= Nn? (Ex; 22) 
rl! E(xi -++ x) 
oir ao\f 2 2\\ir ° 
2° (ar)! (E(x1 22)) 


lim 
(21) 


It follows from (17), (19), and the normal representation of symmetric func- 
tions that 


ki > a 2 ie ee é. = §} + O(N*). 


Ga,<Fag<*** <a, 
From (17) and (18) we have S. = Q(N). Since 
E(aj +--+ a7) =rl(> ++ Dad, «++ ae, IN(N — 1) --- (N— rt DI, 
<a, 


Ga,<Gao< — 


we obtain 


m E(x +++ 2) —. =) = ] 
N00 (E(x ©2)) 


(22) 


The second half of Lemma 7 follows from (21) and (22). 
LEMMA 8: 


.  E(RQN)) _ 
jim “(R(N)) 
_  E(R*(N)) _ 
lim “2(R(N)) ~ 
PROoF: Equation (24) is a trivial consequence of (23). From (15) E(R) = 


O(1) and from (16) o(R) = Q(N 4). The lemma follows easily from these rela- 
tions. 


(23) 0, 


(24) 1, 
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PROOF OF THE THEOREM: According to Lemma 7 the r-th moment of R[E (R*)y? 
approaches the r-th moment of the normal distribution as N — «©. From this 
and Lemma 8 the required result follows if condition (c) holds. It remains 
therefore merely to remove condition (c). Assume now only that a, a2, --- 
a,,°°* satisfy conditions (a) and (b). 

R(N) is formed from the population of values a;, a2, ---,@w. Addition of 
a constant g to a; , --- , dy adds the same constant to all the values of R(N) and 
hence leaves [R(N) — E(R(N))]/o(R(N)) unaltered. Let q% be —>o%.14./N 
and write bS” = a, + q’. Consider the sequences 


B® = oi? pf” eos, pb (¢ = 1, 2, +++ , ad inf.). 


From (17) it follows that the | ¢™ | are bounded for all N. Hence the se- 
quences B“” satisfy condition (a). They obviously satisfy condition (c). Since 
6(j) is invariant under addition of a constant we have 


lim inf ; (= oi)? - (a ?)) > 0, 


a= a=l 


’ 


so that the B satisfy condition (b). Since [R(N) — E(R(N))]/o(R(N)) has 
the same distribution in the sequence a; , d2, --- , @y as in the sequence B“”, 
the theorem follows. 


It should be remarked that the theorem remains valid if conditions (a) and 
(b) are replaced by the weaker condition 


u,/us’ = O(1) (r = 3, 4, +++, ad inf.) 


where 


wa (a ia, 

Mr N= a NS a 
This follows easily from the fact that anes ) — E(R(N ))I/o(RW) )) remains un- 
altered if we replace the sequence a;, --- , @y by the sequence ci , ch , °** , Cy 


where 


= (a-$da) /[F ¥ (a - 7 Dae) |. 


Conditions (a) and (b) are obviously satisfied by the sequence ct.,--+, Cy. 


5. Transformation of the original observations. 


Let f(t) be a continuous and strictly monotonic function of t(—» <t<+). 
Suppose we replace the original observations a; , --- , dv by d, «++ , dy, where 
da = f(az) (a = 1,---,N). We obtain a valid test of significance if we carry 
out the test procedure as if d,, --- , dy were the observed values instead of 
a, -°--,@y. We could also replace the observed values a, --- , @y by their 
ranks. The question arises whether there is any advantage in making the test 
on the transformed values instead of on the original observations. It may well 
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be that by certain transformations we could considerably increase the power of 
the test with respect to alternatives under consideration. This problem needs 
further study. 

6. Summary. A test procedure based on serial correlation is given for testing 
the hypothesis that 2, --- , 2» are independent observations from the same 
population, i.e., that 21, --- , 2y is a random series. By considering the dis- 
tribution of the serial correlation ccefficient in the subpopulation consisting of 
all permutations of the actually observed values a test procedure is obtained 
such that 

a) if the common c.d.f. F(x) is continuous, the size of the critical region, 
i.e., the probability of rejecting the hypothesis of randomness when it is true, 
does not depend upon F(z), 

b) if F(x) is not continuous but all its moments are finite and its variance is 
positive, the size of the critical region approaches, as N — ~, the value it 
would have if F(x) were continuous. Thus in the limit an exact test is pos- 
sible in this case as well. 

It is shown that the test based on the serial correlation with lag h is equivalent 
to the test based on the statistic‘ 


N 
y La Lh+a 
a=l 


where 2+ is to be replaced by 2142—w for all values of a for which h + a > N. 
If h is prime to N, the distribution of >>? xaxn+q is exactly the same as the dis- 
tribution of R = oY tatize 

The mean value and variance of R are given by the following expressions: 

E(R) = (Si — S:)/(N — 1) 

and 
Si — Si / St — 4878. + 48:8; + S — 2S, - (Si — S.)’ 
N-1 (N — 1)(N — 2) (N — 1)? 
where S, = i +--+ +24. 

It is shown that under some mild restrictions the limiting distribution of RF is 


normal. The test procedure can therefore be easily carried out when N is 
sufficiently large to permit the use of the limiting distribution of R. 


o(R) = 
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ON A GENERAL CLASS OF “CONTAGIOUS” DISTRIBUTIONS 
By W. FELLER 


Brown University 


1. Introduction. In a paper of considerable interest, J. Neyman [11] recently 
discussed frequently occurring situations where the usual tests of significance 
fail. He discussed, in particular, experiences in entomology and bacteriology 
which cannot be described by the usual distribution functions and he constructed 
several new types of apparently contagious distributions. Now at first glance 
Neyman’s investigation may seem of a rather specialized nature, and his distri- 
butions of a restricted applicability. It may therefore be useful to point out 
that they are intimately related to results obtained by various authors in con- 
nection with topics having so little apparent relation as accident statistics, tele- 
phone traffic, fire damage, sickness- and life-insurance, risk theory, and even an 
engineering problem. Viewed in the proper light of a general theory, Neyman’s 
method is particularly closely related to some too little known considerations by 
Greenwood and Yule [6]. These authors were the first to find, and apply, the 
distribution which shortly afterwards was independently rediscovered by Eggen- 
berger and Polya’ [3, 4]. 

Greenwood and Yule discussed two types of what may conveniently be called 
contagion: with one type there is true contagion in the sense of Polya and Eggen- 
berger, where each ‘'favorable’’ event increases (or decreases) the probability 
of future favorable events; with the second type the events are, strictly speak- 
ing, independent and an apparent contagion is actually due to an inhomogeneity 
of the population. The two explanations are very different in nature as well as 
in practical implications. It is therefore most remarkable that Greenwood and 
Yule found their distribution assuming an apparent contagion; in their opinion 
this distribution contradicts true contagion. On the contrary, Polya and Eggen- 
berger arrived at the same distribution assuming true contagion, while the possi- 
bility of an apparent contagion due to inhomogeneity seems not to have been 
noticed by them. The Greenwood-Yule-Polya-Eggenberger distribution has 
found many applications.” Therefore the possibility of its interpretation in two 
ways, diametrically opposite in their nature as well as in their implications is of 
greatest statistical significance. This fact is, incidentally, a justification for 
general theories in statistics. 

We shall see that Neyman’s contagious distributions belong to the second 
type and are related to the Polya-Eggenberger distribution only if the latter is 


1 The fact that the Polya-Eggenberger distribution is identical with the Greenwood-Yule 
dist’ :bution seems to be mentioned in the literature only in a Stockholm thesis by O. Lund- 
berg [9]. 

2 Of quite recent applications we mention Kitagawa and Huruya [8], Rosenblatt [15], 
O. Lundberg [9]. Only the latter seems aware of the double nature of the distribution. 
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interpreted in the sense of Greenwood and Yule. In Neyman’s case as well as 
in the other cases referred to above we are concerned with inhomogeneous popu- 
lations and there exists an extremely simple device to describe such situations 
appropriately. Once stated, this device will appear trivial. Nevertheless, a 
straightforward application of it would have avoided considerable mathematical 
difficulties in the literature and, occasionally, yielded better and simpler results. 
It seems also the simplest description of the mechanism behind many observed 
distributions, and therefore suited for a theory of tests’. 

To start in a purely formal manner, consider an arbitrary cumulative distri- 
bution function (c.d.f.) F(x, a), depending on a parameter a, and another c.d.f. 
U(a). Then 


(1.1) G(z) = | F(e, a) dU(a) 


(the integration extending over the domain of variation of a) is again a c.d.f. 
If, in particular, U(a) is a step function, (1.1) reduces to 


(1.2) G(z) _ ZpF (2, ai), 


where p; is the weight attached to a; (we have, of course, p; > 
Instead of (1.2) one can write more simply 


(1.3) G(x) = =p.F;(z), 


where the F;(x) are arbitrary c.d.f.’s. Of course, F(z, a) and U(a) may depend 
on additional parameters, and the procedure can be repeated. 
The statistical meaning of (1.3) is clear. Consider a population made up of 
several subgroups Ai, Az, -+-, mixed at random in proportions 7:1: pe: -- 
If F(x) is the c.d.f. of some character in A; , then G(x), as defined by (1.3), will 
represent the c.d.f. of that character in the total population, provided that the 
subgroups A; are statistically independent. Similarly (1.1) describes an infi- 
nitely composite population. Postponing a discussion of the property of con- 
tagion to the last section, we shall first deduce a few properties of the compound 
Poisson-distribution, considered first by Greenwood and Yule. Neyman’s 
“Contagious Distributions of Type A” as well as the Polya-Eggenberger distri- 
bution belong to this class. Our next example of a special case of (1.1) is what 
F. E. Satterthwaite [16] called the ‘“‘Generalized Poisson Distribution.” It has 
been independently discovered by many authors and represents heterogeneity 
of quite different a nature. Instead of further examples we shall, in the fourth 
section, show how Neyman’s most general contagious distribution can be de- 
duced by a repeated application of (1.1). 















0, LD: = 1). 




































* Incidentally, attention may be drawn to an argument by Greenwood and Yule showing 
that the x?-test when applied to the Poisson distribution is biased and tends to exaggerate 
the goodness of fit. The argument could be amplified from other experience. 
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Notation: If F(x) and G(z) are the c.d.f.’s of two independent variates X and 
Y, then their convolution, (that is to say the c.d.f. of X + Y) will be denoted by 
F(x)*G(z). Thus 


(1.4) F(x)*G(x) = C F(x — y) dG(y). 
More particularly we shall write 
(1.5) F(z)*F() = F"(a), 
F"(2)*F(z) = F@*?*(z), 
We shall denote by E(x) the unitary c.d.f. 


_ j0 for x< 
(1.6) E(x) -{ iw « 


1, 
1, 
so that E”’*(x) = O for z <n, and 1 forz > n. 


2. The compound Poisson distribution. Consider the well-known Poisson 
expression 


=e" 
(2.1) a(n; a) =e nl? 





where the parameter a > 0 gives the expected number of “events”. We shall 
refer to (2.1) as the simple Poisson distribution. If different individuals of a 
population are associated with different values of a, and if the character a is 
distributed according to the cumulative probability law U(a), the probability 
of n events in the total population will be given by 


(2.2) — | e** au(a). 
0 n! 
Following Greenwood and Yule we shall refer to (2.2) as the compound Poisson 
distribution. Referring for an interpretation to the last section, we first con- 
sider a few special cases. 
a) If U(a) is a step function we are led to expressions of the form 


(2.3) wf = a = pie *a;. 


Such a distribution has been successfully applied by C. Palm [12] to problems of 
telephone traffic, and by O. Lundberg [9] to sickness statistics. 
b) If U(a) isa Pearson Type III distribution 


d 
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(with d > 0,h > 0), then 


h 
(2.5) a eae Gta (.,)- 
d 


This is the Polya-Eggenberger distribution in its usual form, and has in this form 
(with a slight change of notations) been derived by Greenwood and Yule. 
c) If a takes on the values ke only, where c > 0 is a constant and ¢ = 0, 



















1, --- , andif ais distributed according to the Poisson law 
»* 
(2.6) Prob {a = ke} = &~* zl 
then 
oil c ~o k” ae k 
(2.7) m=e — —(é€°r). 


n! k=0 k! 


This is Neyman’s contagious distribution of type A depending on two parameters 
(cf. section 4). If, instead, a is distributed according to a multiple Poisson law 
of form (2.3) we arrive at Neyman’s more-parametric distribution of type A. 
They are, of course, essentially linear combinations of expressions of form (2.7). 

It follows from the theory of Laplace transforms that two compound Poisson 
distributions associated with different c.d.f.’s U(a) are never identical. 

The cémpound Poisson distribution gives a simple explanation of a phenome- 
non recorded by Neyman and observable in many instances. In the experi- 
ments described by Neyman “the attempts to fit the Poisson Law --- failed 
almost invariably with the characteristic feature that, as compared with the 
Poisson Law, there were too many empty plots and too few plots with only one 
larva”. It is easily checked in the literature that similar situations arise fre- 
quently. Now the Poisson distribution is usually fitted by the method of 
moments. Accordingly, the compound Poisson law (2.2) ought to be compared 
with the simple Poisson distribution with the same mean value. The mean 
value of (2.2) is 


(2.8) m = f a dU(a), 


so that (2.2) ought to be compared with the Poisson distribution r(n;m). Now, 
whatever the c.d.f. U(a), we have always 

(2.9) %o = x(0, m) 
and 








™ _ w(1, m) 
(2.10) = Im= aa 
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As a matter of fact, using Lagrange’s form for the remainder in Taylor’s for- 
mula, we have 


m=eE” [ é” * dU(a) 
(2.11) a 
>on" | {1 + (m— )} Ua) = &* = x00, m), 


which proves (2.9). Similarly 


mm—-™m=e” [ e” *(m — a) dU(a) 
0 
(2.12) 


> eo [ (m — a) dU(@) = 0, 


which proves (2.10). 

The above theorem shows that, whenever the material under observation is not 
quite homogeneous so that the compound Poisson law applies instead of the simple 
one, there will be too many cases with ‘“‘no event” and, as compared with these cases, 
too few with “‘one event”. It should be noticed, however, that it is not strictly 
true that always 


(2.13) ™, < (1, m). 


As a matter of fact, even in the numerical example given by Neyman, the com- 
puted value 7m, exactly equals the observed value. Still, the inequality (2.13) 
will hold whenever the third moment about the mean of U(a) is smaller than 
twice the second. Writing 


o = [ (a — m)’ dU(a), 
(2.14) 


M= I ” (a — my dC), 


and using two more terms in the Taylor development of e” * than in (2.11) and 
(2.12) we see that 


2 
(2.15) ia on{i +5 - iar} 
and 
(2.16) mm ~-_ ae ~ a. 


These inequalities are slightly sharper than (2.9) and (2.10), and often permit us 
to estimate the variance of U(a). 
We note furthermore that the variance of the compound Poisson distribution is 


(2.17) otm 
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as compared with the variance m of the corresponding simple Poisson distribution. 
Finally the following important property of the compound distribution may be 
mentioned: Consider two independent variates X and Y distributed according to 
two compound Poisson distributions {x°?} and {x} associated with the c.d.f.’s 
U,(a) and U2(a), respectively. Then the variate X + Y is distributed according to 
a compound Poisson law {x,} associated with the c.d.f. U(a) = U,(a)*U2(a) 
(ef. (1.4)). 
It suffices to note that U;(a) = 0 for a < 0, so that 


U(a) = [ Ui(a — s) dU;(s); 


therefore, after a permitted change of the order of integration 


n 
nani 


Tr, = [e ial (@) 
= [ dU2(s) [ eS dia ~@ 


' - —(s+t ( + t)” 
- I dU2(s) I eto STP a0 


n 
vs 1 1 (1) (2), 
ae a) Tk Wn-k ? 


&ki(n— &)! 


the last expression represents the convolution of {r{?} and {12}. 

Neyman’s distributions of type A with two parameters are special cases of a 
compound Poisson process where U(a) is a step function with jumps at equidis- 
tant places, the jumps being given by a simple Poisson distribution {2(n; \)}. 
Now the convolution of two such distributions is again a simple Poisson distribu- 
tion {x(n; 2\)} with jumps at the same places; hence the convolution of two 
distributions of type A is again a similar distribution with one parameter doubled. 

As mentioned before, the notion of a compound Poisson distributian is due to 
Greenwood and Yule [6]. The time dependent compound Poisson process has 
been the object of detailed investigations by J. Dubourdieu [2] and O. Lundberg 
[9]. The latter has discussed also the problem of fitting the compound Poisson 
process to empirical distributions. 


3. The generalized Poisson distribution. Let F(x) be an arbitrary c.df. 
Then its n-fold convolution F”’(x) (cf. (1.5)) may be considered as a c.d.f. 
depending on a parameter n. Choosing, for the latter, the simple Poisson dis- 
tribution (2.1) and performing the operation indicated in (1.1), we arrive at the 
c.d.f. of the generalized Poisson law 


(3.1) G(z) = e ” . F(z). 
n=0 nN: 
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If, in particular, F(z) is the unitary function (1.6), we have the ordinary Poisson 
law 


© n {z] n 
(3.2) Il @ => SE @ =D es 
n=0 n! n=0 nN! 

in its cumulative form. 

The most frequently encountered application of the generalized Poisson dis- 
tribution is to problems of the following type. Consider independent random 
events for which the simple Poisson distribution may be assumed, such as: 
telephone calls, the occurrence of claims in an insurance company, fire accidents, 
sickness, and the like. With each event there may be associated a random 
variable X. Thus, in the above examples, X may represent the length of the 
ensuing conversation, the sum under risk, the damage, the cost (or length) of 
hospitalization, respectively. To mention an interesting example of a different 
type, A. Einstein Jr. [5] and G. Polya [13, 14] have studied a problem arising out 
of engineering practice connected with the building of dams, where the events 
consist of the motions of a stone at the bottom of a river; the variable X is the 
distance through which the stone moves down the river. 

Now, if F(x) is the c.d.f. of the variable X associated with a single event, then 
F”’ (x) is the c.d.f. of the accumulated variable associated with n events. Hence 
(3.1) is the probability law of the sum of the variables (sum of the conversation 
times, total sum paid by the company, total damage, total distance travelled by 
the stone, etc.). 

In view of the above examples, it is not surprising that the law (3.1), or special 
cases of it, have been discovered, by various means and sometimes under dis- 
guised forms, by many authors. Quite recently Satterthwaite [16] was led to it 
(in the above simple form) from problems in insurance. Related (but less ele- 
gant) considerations may be found in a paper by W. G. Ackermann [1]. Simple 
as they are, the above considerations leading to (3.1) furnish a complete solution 
of the problem in all the cases mentioned. Unfortunately, the special features 
of the problems often so overshadow the essential point, that one is often led to 
unnecessarily complicated and incomplete solutions. As an example of the diffi- 
culties in considering special cases we mention that Polya [13, 14] was led to a 
partial differential equation of the hyperbolic type, which conceals the elementary 
nature of the problem. 

If F(x) is itself a Poisson c.d.f. (3.1) reduces to (2.7). Thus Neyman’s distribu- 
tion of type A depending on two parameters is both a compound and a generalized 
Poisson distribution. We shall later on see that the generalized Poisson distri- 
bution plays an even more important réle in Neyman’s theory. 

The main properties of (3.1) are easily derived using characteristic functions. 


If g(z) is the characteristic function of F(x), the characteristic function of G(z) 
is 


(3.3) Y(z) = eee, 
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Accordingly the r-th semi-invariant of G(x) equals the r-th moment of F(x) multi- 
plied by a’. Moreover it is readily seen that the r-th convolution of G(x) with 
itself is again a function of type (3.1), only with a replaced by ra. Neyman’s 
Proposition IT is a special case of this remark. 


4. Neyman’s contagious distributions. As an illustration of the general 
applicability of the operation (1.1) we shall consider the typical example treated 
by Neyman. Consider the distribution of larvae in a field. The field is divided 
into plots of equal areas and we are interested in the probability c, that exactly 
k larvae are found in a certain plot. Now we assume with Neyman: 

(t) The larvae may come from various litters. It is assumed that the proba- 
bility that exactly » litters are represented on our plot is given by the simple 
Poisson distribution‘ (2.1). (ii) The probability that there are exactly n sur- 
vivors is the same for all litters and will be denoted by p(n). (zz) If, in any 
particular litter, there are exactly n survivors, the probability that k of them are 
found on the plot under observation is given by the binomial distribution. We 
shall write the latter in its cumulative form 


(4.1) Bee, n, u) =X (7)wta - WBN, 
(cf. (1.6)). (iv) The parameter wu in (4.1) is characteristic for any particular 
litter (and varies, in particular, with the position of the litter relative to the par- 
ticular plot under observation). The c.d.f. of u (which characterizes the distri- 
bution of litters in the field) is supposed to be known and will be denoted by F(u). 
The litters are statistically independent. 

Now for any particular litter the probability that at most k survivors will be 
in the plot under observation is given by 


(4.2) Lik, u) = 3 p(n) B(k, n, w), 


which is a special case of (1.2). Here wis the parameter for the litter picked out. 
Accordingly, the probability that at most k survivors from any one litter will be 
found on our plot is 


(4.3) 1) = [ Li, w) aru, 


and this is the second application of the operation (1.1). Since any number of 
litters may be represented on our plot, the final expression for the probability 


4 Actually Neyman at first assumes the number of litters in the field to be finite and 
considers therefore the binomial instead of the Poisson distribution. Later, however, a 
passage to the limit is performed which is equivalent to the above assumption. It will be 
seen that in the following consideration the Poisson distribution may be replaced by any 
other distribution. 








=~ a ES UV 
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that at most k larvae will be found on our plot is obtained in the form of a 
generalized Poisson c.d.f. 


(4.4) C(k) = Le ene — L"(k). 


This is the desired c.d.f. For the desired probability c, we have c, = C(k) — 
C(k — 1). 

We specialize now with Neyman the assumption (77) to the effect that the dis- 
tribution function {p(n)} is a Poisson distribution 


in 


(4.5) p(n) = 6%, 


The distribution (4.2) then becomes the c.d.f. of a generalized Poisson distribu- 
tion, since B(z, n, u) = B”’(z, 1, u). 
The simplest special case arises when all litters are characterized by the same 


value of the parameter, say u = uw. Then F(u) = E 4). and L(k) = L(k, uo). 


Writing L'(k) = L(k) — L(k — 1) for the probability that exactly k survivors 
from any one litter will be found on our plot, we have 


L'(k) = = 4 ad! u( ) wa — uw)" 
n=k 
(4.6) 
_ e wo (Auo)* 
ki ° 


The c.d.f. (4.4) then reduces to the form (2.7). Similarly, when F(u) is a step 
function we arrive at Neyman’s more parametric distributions of type A. 
1 


If F(u) = u for 0 < u <1 (rectangular distribution), then I B(k, n, u) dF (u) has 
only jumps of magnitude 1/(n + 1), and 
C oe x” 
i Bpety an D etentens 
— (t) 2d, (n + 1)! 


This leads to Neyman’s function of type B. The characteristic function of 
(4.7) is readily seen to be 
a see 5 








so that the characteristic function of the final c.d.f. C(k) becomes 


i@e* 1 
=p {a GG=7- )} 


in agreement with Neyman’s formula. 
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5. The nature of contagion. It is well known that the simple Poisson dis- 
tribution describes mutually independent events; in other words, with a Poisson 
distribution the numbers of evénts in two non-overlapping time intervals are 
uncorrelated and the occurrence of an event has no influence on the probability 
of occurrence of further events. Accordingly, the compound Poisson process 
also applies to independent and not contagious events. With really contagious 
events (as, for example, with epidemics) the occurrence of each event increases 
(or decreases) the probability of further events. Greenwood and Yule [6] de- 
veloped a very general scheme for such events but, due to the very generality, 
their formulas became too complex for practical applications. They considered 
the compound Poisson process, and, in particular, the Polya distribution (2.5), 
as an alternative hypothesis. Accordingly, they interpreted the good fit of that 
distribution to accident statistics as indicating that there was no contagion but 
that proneness to accidents varies with the person. 

Considering a very similar problem, Polya and Eggenberger were later on led 
to consider a special model of true contagion. This turns out to be the simplest 
case of the general Greenwood-Yule scheme, but this had been overlooked by 
them. Curiously enough, Polya was led exactly to the distribution (2.5) which 
Greenwood and Yule found as an alternative to contagion. It is therefore seen 
that, contrary to a wide-spread opinion, an excellent fit of Polya’s distribution to 
observations is not necessarily indicative of any phenomenon of contagion in the 
mechanism behind the observed distribution. In order to decide whether or not 
there is contagion, it is not sufficient to consider the distribution of events, but 
a detailed study of the correlation between various time intervals is necessary.’ 

The double interpretation of Polya’s distribution leads to an understanding 
of the compound Poisson distribution. To the observer the compound Poisson 
distribution will always appear “‘contagious’’; however, this contagion is not in- 
herent in any phenomenon in nature, but simply in our method of sampling. Asa 
matter of fact, with a compound Poisson distribution the parameter a is a ran- 
dom variable.* Its a priori c.d.f. in the total population is Prob {a < x} = U(z). 
Now if, for any particular sample, the observed number of events is n, then the 
a posteriori c.d.f. of a in that sample is given by 


| e*® dU(s) 

MeO te S x} = S  — 
[ Sav) 
0 ni 


> For such studies cf. Newbold [10] and Lundberg [9]. For some generalizations of the 
Polya-Eggenberger scheme see Kitagawa [17] and Rosenblatt [15]. 

6 It will be noticed that here a is actually a random characteristic in the population and 
can be sampled. We are therefore not guilty of the absurdity which is usually connected 
with the unfortunate use of Bayes’ theorem, when a constant is regarded as random vari- 
able. If the output of a machine is distributed according to a Poisson distribution, its 
parameter is a constant, characteristic of that machine. Regarding it as a random variable 
means to consider the collective of non-existing similar machines and making predictions 
for them, whereas we are interested in the one machine only. 
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This is additional information enabling us to make better predictions for the 
future or estimates of other properties of the sample. For example, if n is very 
large, there is a considerable probability that the mean of a in the sample exceeds 
that of the total population: accordingly, we shall expect that also in the future 
the number of >vents in our sample will be comparatively large. In other 
words, although he events themselves are strictly independent we have an 
apparent contagic \ due to our method of observation. 

It is hardly nect sary to point out that the contagion studied by Neyman is 
of the type just described. Any inhomogeneity of a population of type (1.1) 
will lead to such an apparent contagion. However, that the Polya-Eggenberger 
distribution is a member of our class of contagious distributions must be regarded 


as accident and due to the possibility of its being interpreted as a compound 
Poisson distribution. 
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ON THE CONSTRUCTION OF SETS OF ORTHOGONAL LATIN SQUARES’ 


By H. B. Mann 
Columbia University 


1. Introduction. 

An m-sided Latin square is an arrangement of m symbols into a square in such 
a way that no row and no column contains any symbol twice. Two Latin 
squares are called orthogonal if, when one is superimposed upon the other, every 
pair of symbols occurs only once. For instance the squares 


A BC a Bp ¥ 
BCo4§A y a B 
C A B Bv¥y.ea 


are orthogonal. The resulting square is 


Aa BB Cy 
By Ca AB 
CB Ay Ba. 


A pair of orthogonal Latin squares is called a Graeco-Latin square. A method 
has not yet be~n found by which all possible sets of mutually orthogonal squares 
can be consti cted. However, methods are available for constructing certain 
special sets, and although we cannot obtain all possible sets with these methods 
they yield a great variety of designs. 

To understand these methods we shall have to use certain fundamental con- 
cepts of the theory of numbers. In the following we shall deal therefore only 
with integers and all symbols used will denote only integers. 

Let a, b, m denote certain integers. We say 


a = b (m), 


(in words a is congruent to b modulo m) if a — b is divisible by m. 

Such congruences can be treated like equations. For instance: If a = b (m), 
then a + c = b + c (m), ac = be (m). The proofs of these statements are 
obvious from the definition of a = b (m). 

If a = b (m), and c = d (m), then ac = bd (m), anda +c =b+d (m). 

Proor: According to our definition we have 


a—b=dm a=b+aAm 
c—d=)hm c=d+)\m 
1 An expository paper presented, at the invitation of the program committee, on Sep- 


tember 12, 1943 at the Sixth Summer Meeting of the Institute, at the New Jersey College 
for Women, Rutgers University, New Brunswick, N. J. 
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ac = bd + m(r2b + Aid + Adem) anda +c =b+d+m(rA — dr). Hence 
ac — bd and (a + c) — (b + d) are divisible by m. 

We have to be more careful with the division of congruences but we shall 
prove the following rule. 

If a is prime to m and ab = ac (m) then b = c (m). 

Proor: a(b — c) = Am, by hypothesis. The left side of this equation is 
divisible by m. Since a is prime to m, b — c must be divisible by m. 

This rule means that we may cancel as in an ordinary equation as long as the 
cancelled factor is prime to the modulus. 

Every number is congruent to one of the numbers 0, 1, 2, --- , m — 1, because 
if a is any number we can find a number b such that 0 < a — bm = j < m. 

We shall now add, subtract and multiply mod m. That means we add, sub- 
tract and multiply in the ordinary way but shall always replace every number 
by its smallest positive remainder. Thus for instance 


2+ 4=1 (5) 
2-4 = 3 (5). 


2. Complete sets of m-sided orthogonal Latin squares, where m is prime. 
Now let p be a prime number. We write down the following design 


0 7 = ie p—-l 
j +3 ws p-1+j 
2j 1+ 2 tt * p-1+2 =L;; 0<jsp-1 


iy t¢Q@—ty -+-p~-t4+o— 

where all expressions are to be taken mod 7, that is we replace every number in 
this square by its smallest remainder mod p. We shall show that L; is a Latin 
square. Here the rows and columns are numbered from 0 to p — 1. Assume 
that the kth row (0 < k < p — 1) contains a number twice. Then we would 
have 


a+ kj =b+ kj (p) with a # b (m). 


But from this we obtain a = b (p), which is a contradiction. Now assume that 
a column contains a number twice. Then we would have 


a+ kj zat kj (p), with k # & (m) 
but from this we have 
kj = k’j (p), 
and since j is prime to p 
k =k’ (p), 






which is again a contradiction. 
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We can obtain p — 1 such Latin squares corresponding to the p — 1 values 
which 7 can take. 

We shall show that L; is orthogonal to L; if i ~ 7. If this were not true we 
would have the same pair of numbers occurring in two cifferent boxes of the 
square which results from the superimposition of Z; on L;. Let mn be such a 
pair and assume that it occurs in the ath row and #th column and the yth row 
and 6th column of the resulting square. Then m would occur in L; in the ath 
row and 6th column and in the yth row and 6th column. Hence we would have 


(i) B+aj=m=5+ yj (p), 
and similarly 
(i’) B+ai=n=6+ 7 (p). 


If we subtract the second congruence from the first we obtain 
a(j — 1) = vj — +) (p), 
but j < pandi < pandj #7. Hencej — i # 0 (p) and we may therefore 
divide by (j — 12). This gives 
a = ¥ (p). 
Substituting this in (2) we obtain 
8 = 6 (p). 
Hence the two boxes must be the same. We have therefore the following 
theorem: 
THEOREM 1: Jf p is a prime number and 
0 1 co p—l 
j i+-J .=9 p-~-1+ 3 
L; = ‘ ‘ ees ‘ 


ote t+ Q@~-ty -++ gr is+ge~ 
then Ly, Lo, --- , Lp-1 is a set of p — 1 orthogonal Latin squares. 


As an application we can write down a set of 4 orthogonal Latin squares of 
side 5 


Ly Lz Ls Iy 
01234 01234 01234 01234 
12340 23401 34012 40123 
23401 40123 12340 34012 
34012 12340 40123 23401 
40123 34012 23401 12340 


A further simplification can be achieved if we know a primitive root mod p. 
A primitive root is a remainder a mod p such that every other remainder except 0 
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is equal to a power of a mod p. For example, 3 is a primitive root mod 7, for 
3° = 1 (7), 3' = 3(7), 3’ = 2(7), 3° = 6(7), 3° = 4(7), 3° = 5(7). 

For any number a we must have a” = 1 (p). We will prove this equation 
for primitive roots only, since we do not need the general case. Let a be a primi- 
tive root and assume that 


a? = b = a‘ (p), with g <p —1. 

Then we would have 
a? ** = qa” = 1 (p), with p’ < p — 1. 
Hence we can obtain at most p — 2 different remainders a‘a’, --- , a” anda 


would not be a primitive root. 
We now form 


0 1 > << p— 1 
a a’ 1 “+ a’t* sep 1 + gt 
Lx ie 1+¢% 1 + 


q't* -- + p—-1l+a™. (¢ =0,1,---,p— 2) 


av te a tare ep 1 tar 


Exactly as in the case of the L; of Theorem 1 it can be shown that L; is orthog- 
onal to L;if i+ j. Fork < p — 1 the k-th row of L; equals the (k — 1)st 
row of L;4; and since a?’ = 1 (p) the last row of Z,4; equals the first row of L;. 
Hence Li1; is obtained from L; by a cyclical permutation of the (p — 1) last 
rows. It is then only necessary to construct the first square. The others can 
be obtained by a cyclic permutation of the (p — 1) last rows. We shall exem- 


plify this by constructing a set of 6 seven-sided orthogonal squares. 


I; Ly Ls 
0123456 0123456 0123456 
1234560 3456012 2345601 
3456012 2345601 6012345 
2345601 6012345 4560123 
6012345 4560123 5601234 
4560123 5601234 1234560 
5601234 1234560 3456012 

In Ls Le 
0123456 
6012345 
4560123 
5601234 
1234560 
3456012 
2345601 
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In the theory of numbers it is shown that a primitive root exists for every 
prime number. If p is not too large a primitive root can easily be found by trial 
anderror. We give a list of primitive roots for all primes under 30: 


Prime number Primitive root 
3 

5 

7 

11 

13 

17 

19 

23 

29 


In computing the first row of the first square it is not necessary to actually com- 
pute all powers of the primitive root. We can take advantage of the fact that a 
congruence may be multiplied by a number. Thus, for instance, for the first 
row of the 11-sided square we have 2°=1(11) 2)=2(11) 2 =4(11) 2= 


Non WN ND WWD WY 


8 (11) 2*=5(11) 2° = 2.5 =10(11) but 10 = —1 (11), hence we have 
without further computation 2° = -—2 =9 (11) 2’=—-4=7(11) 2= 
-§ = 3 (11) 2 = -5 = 6 (11). 


3. Complete sets of m-sided orthogonal Latin squares, where m is the power 
of a prime. 

We have seen that we can always construct m — 1 orthogonal Latin squares 
if m is a prime number. We shall show how to construct m — 1 orthogonal 
Latin squares if m is the power of a prime. However, if we need only a Graeco- 
Latin square of side m and if m is odd, then we can use the following theorem: 

THEOREM 2: If m is odd, then the squares 


0 1 


ni-— 
1 1+1 es m—-1+1 
2 1+2 see m—-1+2 
L,= 4 ‘ : es 5 

m—1l11i+m—-1l++-+m—-1l+m-i1 

0 1 <* m—l1 
2 1+ 2 so m—-1+2 
2.2 1 + 2.2 soe m—1--+- 2.2 

lL. = . . . 2° . 
2m —1) 1+2(m—1) +--+ m—1+2(m—1) 


are orthogonal. 
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The proof is similar to the proof of Theorem 1. We have to use the fact that 
2 is prime to m. 

We shall now prove the following statement: For every remainder a # O(p) 
there exists another remainder a‘ such that a-a~' = 1(p). 


~ 2 ‘ . ae 
Proor: We form the sequence a, a’, ---,a@",---. Since there is only a finite 
number of remainders, there must exist 2 values 7 and 7 such that 


a’ = a’(p) 
Leti > j. Then since ais prime to p we may divide by a’. Putting i — 7 = d, 
we obtain 


a’? = a’ = 1(p). 


Hence we may take a = a*' and our statement is proved. Thus we see that 
the system of remainders mod p with respect to addition as well as with respect 
to multiplication if 0 is excluded satisfies the following postulates: 

(1) For every pair of elements A, B there is defined a product A-B within the 
system such that for any 3 elements A, B and C 


A:(B-C) = (A-B)-C (associative law) 


The ‘‘multiplication’”’ may be any sort of composition. For example, either 
addition or multiplication of remainder classes. 
(2) There exists a unit element 1 such that 


A-l=1-A =A. 
(3) For every A in the system there exists an element A™ such that 
A:-A*=A™"-A =1., 


The unit element will be 0 if we consider the remainder classes with addition 
as composition. It will be 1 if multiplication is the composition. The inverse 
of a is —a for the additive system, a for the multiplicative system. 

A system satisfying (1), (2) and (3) is called a group. The property A-B = 
B-A is usually not postulated. Ifa group fulfills this condition, then it is called 
a commutacive group or an Abelian group. A group can be defined by its gen- 
erating elements. For example, let G be generated by the elements P, Q with 
the relations P? = 1, Q* = 1 and PQ = Q’P. We then obtain the elements of G 
as 1, P,Q, PQ, PQ’, Q’. The rules for the multiplication can be written down in 
the form of a table: 


Q 
PQ 


PQ PQ’ 
Q Q’ 
P PQ 

1 Q 
Q’ 1 
PQ’ P 


9 
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By inspection one can see that taking the elements of our group as symbols 
the multiplication table forms a Latin square. For instance, if we identify P 
with 2, Q with 3, etc. we obtain from the table above 


1 2 3 4 5 6 
2 1 4 3 6 5 
3 5 6 2 4 1 
+ 6 5 1 3 2 
5 3 2 6 1 4 
6 4 1 5 2 3 


We shall prove that this is generally true. Let the group G consist of the 
elements Ai1,---,Am. We write down the multiplication table of the group: 


Aj a «2% Am 
Az AcA2 + + + AtAnm 


Aw AmA2 + + + AmAm 


“ 


Suppose this is not a Latin square. Then an element will occur twice in at least 
one row or at least one column, that is, we should have either 


A;A;i = A;Ak, fori ~ k 
or A,;A; = AxA;, forj + k. 


Multiplying the first equation by Aj’ on the left, we obtain A; = Ax. Hence 
1 = k. Similarly in the second case 7 = k, contrary to our assumption. 

Two groups G and G are called isomorphic if we can map G into G in such a 
way that the mapping is not disturbed by multiplication. That is, if A is mapped 
on A and B on B and if AB = C and AB = C, then C must be mapped on C. 
Such a mapping is called an isomorphism. If G = G then the mapping is called 
an automorphism. For instance, if we consider the remainder system mod m 
with addition as composition and j is any remainder, then the mapping d = ja 
is an automorphism. For if 


a+b =c(m) 
then aj + bj = cj(m) 


Some automorphisms establish a 1-to-1 correspondence between the elements 
of G. For instance, in the above example if 7 is prime to m the correspondence 
is bi-unique (that is only one element is mapped on any element of @) because if 

aj = bj(m), 
and j is prime to m then 


a = b(m). 
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If 7 is not prime to m, the mapping would not be unique although it would still 
be an automorphism. From now on we shall consider only automorphisms 
which establish a 1-to-1 correspondence between the elements of G. 

Let S be such an automorphism and denote by A* the element into which A 
is mapped under the automorphism S. We put (A‘)* = A*’, (A™)® = A®” ete. 
We also put A®** =A. Weshall prove the following theorem: 

Let S be an automorphism such that S, S’, --- , S* map no element into itself 
except the element 1. Then the Latin squares 

1] Ao = i Am 


AS ATA, --- MT Aq 





L; = 


a re ««+ es 


are orthogonal. 

Proor: Assume that L; is not orthogonal to L;. Let L,; be the resulting 
square if L; is superimposed on L;. Then for some k and | and some r and s 
we should have the same pair of elements in the kth row and [th column and 
in the rth row and sth column. That is, we should have 


(1) Ap Ai = Ay A, 
(2) Af’ A, = Ay'A,. 
By taking the inverse elements it follows from (2) that 
(3) A7'Ay” = Az'A;™ . 
Multiplying (1) and (3) we obtain 
Ap Ay” = A;'A;”. 
Multiplying by A7*’ to the left, and by Aj’ to the right, we obtain 
A;"AE = AT" AP. 
Since S' and S’ are automorphisms we have 
(A7"Au)* = (Az'Ay)”. 
Assuming 7 > j, then : 
[((Ay"AL)? "= (APR). 
Because of i < g,j < q we havei—j <q. By assumption therefore S*”? can 
can leave only 1 fixed. Therefore 
(Az'A,)® = 
Hence A; Ax 
A, 


i 4 
— —_— 
: 








— lO 
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But then also 
Ai => A, . 


Therefore r = k and! = s. Hence the twe compartments of L,;; cannot be 
different and our statement is proved. 


We see therefore that we can construct a set of g + 1 orthogonal Latin squares 
if we can find a group G and an automorphism S of G such that 


S, S’,---, S 
maps no element into itself except the unit element. Ifq = m — 2 and we write 
a A, At +> At 
Az AzA: + +++ Az As" 
ay j *« #e% , 


fee Lagay 
then the (k — 1)st row of L;4: equals the k-th row of L; and all squares may be 


obtained from Ly by a cyclical permutation of the rows. 


We shall now consider commutative groups of prime power order p” defined 
by the relations 


PP =PP=.---=P2=1, P,P; =P;P;. 
The elements of this group G have the form 
Pj! --- P% °°, @, =0,1,°---,p—l. 


We call P, --- P, a basis of G. We can easily change the basis. For instance 
if P,,--- , P, is a basis then also P,, PiP2,--- , PiP, is a basis. For every 
expression we have 


P? ee P* - Pi! ~*8"*"—*=(P, Ps)? sedi (P:P,)", 


since G is commutative. Such a change in the basis defines an automorphism of 


G at the same time. For let Pi, ---, Ps be the new basis. We can map 
Pit... Pt 
into Pi! --- Pier, 


On the other hand an automorphism is determined if we know on what elements 
the basis elements are mapped. 

It can be shown that every such group admits an automorphism S such that 
S, S’, ---, S?” leaves no element fixed except 1. Hence we can always con- 
struct a set of p” — 1 orthogonal squares of side p” if pis a prime. We shall 
give these automorphisms explicitly for the groups of order 8, 9, 16, 25 and 27. 

As an example let us construct 7 orthogonal 8 sided squares. We shall use 
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the group G generated by P, Q, R where P* = Q’ = R® 
morphism S where 


1. Weuse the auto- 





P’=Q Q°=R R® = PQ. 


34 


We then have P’ = Q, P” = R, P* = PQ, P" = P*Q’ = QR, P* = 
Q'R’ = PQR, P* = P'Q'R’ = QRPQ = PR, P* = P'R’ = QPQ = 

If we write the elements in the order 1, P, P’, P”, --- , P** we obtain the fol- 
lowing multiplication table: 


1 r Q R PQ QR PQR PR 

r 1 PQ PR Q PQR QR R 

Q PQ 1 QR P R PR PQR 

R PR QR 1 PQR Q PQ r 
PQ Q P PQR 1 PR R QR 
QR PQR R Q PR 1 P PQ 
PQR QR PR PQ R r 1 Q 
PR R PQR P QR PQ Q 1 




























The other squares are then obtained by a cyclical permutation of the rows of 
this square. We now write 2 instead of P, 3 instead of Q, etc. and obtain: 






12345678 12345678 
21583764 35162487 
35162487 48617352 

r= 18617352 L=- 23271846 

Bea2718046 . 67438125 
67438125 768542183 
76854213 726531 
4726531 583764 














and so forth. 

For the group of order 5, generated by P, Q with the relations P* = Q* = 1 
the automorphism P* = Q, Q’ = PQ has the property that S, S’, --- , S’ maps 
no element into itself. For the group of order 16 we have 4 basis elements 
P,Q, R, T with P? = Q’ = R® = T°’ = 1 and Scan be given by P’ = Q, Q’ = R, 
R° =T,T = PT. 

For the group of order 25 we have two basis elements P, Q with P® = Q° = 1 
and the automorphism is given by P* = Q, Q° = P*Q. 

The group of order 27 is generated by P, Q, R and the defining relations are 
P* = Q = R® =1. The automorphism is given by P’ = Q, Q* = R, R* = P’Q. 

We have now shown 

THEOREM 3: Let m = p” and let G be the commutative group generated by Pi, 

-- , P, which satisfy the relations P? = P? =--- = PR = 1. Let S bean 








if 
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automorphism such that P”” ~ Pif0 <i<m—2,P #1. Then the Latin 
squares 


1 P Pf... Pp 
re Pp : ee a A all 

= PF P'P sees POP G =0,1,+-+,m — 2), 
alii Pp" ***p : he ee a eye 


are orthogonal. L;%s obtained from L;-, by a cyclical permutation of its last m — 1 
rows. 


4. Remarks on the largest number of m-sided orthogonal Latin squares, for 
arbitrary m. 

The general problem can be formulated as follows: Given a number m, what 
is the greatest number of orthogonal m-sided squares. 

It is clear that this number cannot be larger than m — 1. For we can by 
renaming the numbers of the squares always transform them without changing 
their orthogonality in such a way that the first row is 1, 2,--- m. Hence the 
pairs 1 1, 2 2,---, mm, occur for any two squares in the first row of the re- 
sulting square. Hence the numbers in the first column and second row of the 
squares must be different from 1 and different from each other. But we have 
oniy the numbers 2, --- , m at our disposal and these are only m —: 1 numbers. 

We have shown that if m is the power of a prime m — 1 orthogonal squares 
can always be constructed by the use of groups. Hence our problem is solved if 
m is the power of a prime. Very little is known about numbers which are not 
prime powers. Tarry (Compte Rendu, 1900) has shown that no 6 sided Graeco- 
Latin square exists. It is conjectured but not yet proved that no Graeco-Latin 
square of side 4n + 2 exists. We shall, however, show the following: If.m = 
pi' --- px where p; is a prime number (p; ¥ p; for 7 ~ j) and if r = minimum 
p;' — 1 then r orthogonal Latin squares can be constructed from commutative 
groups of order m. 

We take the group G of order m generated by e; elements of order p: , é2 ele- 
ments of order pz, --- , én elements of order p,. We determine the automorph- 
isms 7’; of the subgroup generated by the elements of order p; such that 7; , 
T+, 2 leave no element of order pi fixed. We define then an auto- 
morphism 7’; of G generated by changing the basis elements of order p; in the 


same way as they are changed by 7; and leaving the other basis elements fixed. 
Then 


T= Tile *** Te 
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oon 
























































17 5 
18 14 
19 11 
20 8 
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are orthogonal. 


Q PQ|PR 


3 


4 
1 


9 
6 
15 
12 


4 






we ow 


1 


9 
14 
19 
12 

5 
18 
15 

8 
17 

6 
ll 
20 


13. 


10 
7 
16 





5 


17 
13 





1; R* = 1. 









QR? 


6 






10 
18 
14 


16 
19 
5 
4 
20 
11 
13 
1 
8 
7 
17 
2 


12 
15 
9 
3 









PQR‘ 


7 


15 
11 
19 


SwveS8aw 


18 


4 
13 
16 
10 





20 


A 


PR? 
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16 
12 


19 
4 
6 

17 
7 
3 

18 

13 

15 
1 

10 
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11 
2 
14 
5 
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13 
17 
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is an automorphism whose first r — 1 powers leave no element fixed. 
r Latin squares 





Hence the 


0, 1, --- 


16 





12 
8 





7 6 20 4 
20 ll 13 #1 8 7 #17 2 
2 17 12 14 1 9 8 18 
7 3 18 13° 15 1 10 9 
18 8 4nye 4 6 i H 
8 19 9 2 20 15 5 1 
4 9 20 10 3 17 16 6 
19 2 10 17 #1l 4 18 5 
14 2 3 11 18 12 2 19 
1 15 17 #4 #12 «219 «13 3 
1 5 16 18 2 13 20 14 
ll 1 6 5 19 314 17 
6 16 2 15 10 8 3 7 
12 7 5 3 16 ll 9 4 
3 13 8 6 4 5 12 10 
15 4 14 9 7 2 6 13 


way in which they are written above. 
each cycle. 


PQR? | R 


ig 


5 
9 
13 


14 
12 
4 
11 
6 
16 
2 
15 
10 
8 
3 
ie 


18 
20 

1 
19 


19 
17 
1 


12 
10 


1 
17 
20 
18 


We shall exemplify this by constructing 3 orthogonal squares of side 
We use the group G generated by P, Q, R with the defining relations: P’ = 
The automorphisms are given by: P™: = Q, Q™: 
R7: = R, P?™: = P, Q™: = Q, R?: =R. Hence T = 7,72 is given by: P” = Q, 
Q’ = TQ, R’ = R’. Therefore we have: P” = Q, P™ = PQ, P™ = P’Q’ = P, 
(PR)" = QR’, (PR)” = PQR‘, (PR)” = PR’, (PR)™ = QR, (PR)” = PQR’, 
(PR)™ = PR‘, (PR)” = QR’, (PR)™ = PQR, (PR)” = PR’, (PR)"’ = QR’, 
(PR) = PQR’, (PR)”" = PR, R’ = R’, R™ = R‘, R™ = R’, R™ = R. 

We need only construct one key square if we write down the elements in the 
Then we have only to mark the end of 
Thus in our present case we have: 


1| P, Q, PQ | PR, QR’, PQR‘, PR’, QR, PQR’, PR‘, QR’, PQR, PR’, QR’, 
PQR' | R, R’, R', R° | 


f=} 










10 5 


wmnmnr! © 


1 


19 

1 
18 
17 


20. 


PQ, 
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The vertical lines mark the cycles in which the elements are permuted by the 
automorphisms. We then write down the key square in Table I. From this 
key square we can easily obtain a set of 3 orthogonal squares by permuting the 


TABLE-II 
11 22 33 4,4 £4»5,5 66 #£97,7 #42988 9,9 10,10 
1 2 3 4 5 6 7 8 9 10 
23 1,4 4,1 3,2 17,13 10,18 15,11 20,16 13,17 6,14 
4 3 2 1 9 14 19 12 5 18 
3.4 4,3 1,2 2,1 18,9 18,14 11,19 16,12 17,5 14,18 
2 1 4 3 17 10 15 20 13 6 
4,2 3,1 2,4 1,3 9,17 14,10 19,15 12,20 5,13 18,6 
3 4 1 2 13 18 il 16 17 14 
5,6 17,10 13,18 9,14 18,16 16,19 3,5 19,4 10,20 12,11 
7 15 11 19 3 5 20 6 2 17 
6,7 10,15 18,11 14,19 16,3 19,5 5,20 46 2,2 11,17 
8 20 16 12 19 4 6 17 7 3 
7,8 15,20 11,16 19,12 3,19 5,4 2,6 6,17 2,7 17,3 
9 13 17 5 10 20 2 7 18 8 
8,9 20,13 16,17 12,5 19,10 4,20 6,2 17,7. 7,18 3,8 
10 6 14 18 12 11 17 3 8 19 
9,10 13,6 7,14 5,18 10,12 20,11 2,17 7,3 18,8 8,19 
11 19 7 15 1 13 12 18 4 9 
10,11 6,19 14,7 18,15 12,1 11,13 17,12 3,18 84 19,9 
12 16 20 . 7 1 14 13 19 2 
11,12 19,16 7,20 15,8 1,7 18,1 12,14 18,13 4,19 9,2 
13 9 5 17 6 8 1 15 14 20 
12,13 16,9 2,5 817 7,6 1,8 14,1 13,15 19,14 2,20 
14 18 10 6 20 7 9 1 16 15 
13,14 9,18 5,10 17,6 6,20 8,7 1,9 15,1 14,16 20,15 
15 7 19 11 4 17 8 10 1 5 ” 
14,15 18,7 10,19 6,11 20,4 7,17 9,8 1,10 16,1 15,5 
16 12 8 20 15 2 18 9 11 1 
15,16 7,12 19,8 11,20 4,15 17,2 818 10,9 41,11 5,1 
5 17 13 9 18 16 3 19 10 12 
16,5 12,17 8,13 20,9 15,18 2,16 18,3 9,19 11,10 1,12 
6 10 18 14 16 19 5 4 20 11 
17,18 5,14 9,6 13,10 14,8 12,15 4,13 11,2 6,12 16,7 
19 11 15 7 2 9 16 14 3 13 
18,19 14,11 6,15 10,7. 82 15,9 18,16 2,14 12,3 7,13 
20 8 12 16 11 3 10 5 15 4 
19,20 11,8 15,12 7,16 2,11 9,3 16,10 14,5 3,15 18,4 
17 5 9 13 14 12 4 11 6 16 
20,17 8,5 12,9 16,13 11,14 3,12 10,4 5,11 15,6 4,16 
18 14 6 10 S 15 13 2 12 7 


rows within the cycles indicated. Because of space difficulties we give only the 
first half of the square in Table IT. 

One might hope that with other groups more than r = minimum p{‘ — 1 
orthogonal squares might be obtained. It has been shown however that using 
any group and its automorphisms at most r orthogonal squares can be obtained. 
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A more general method based on groups is given in a recent paper (H. B. 
Mann, “The construction of sets of orthogonal Latin squares,’’ Annals of Math. 
Stat., Vol. 13 (1942)). It can be shown that also with this more general method 
no 4n + 2 sided Graeco-Latin square can be constructed. 
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ON THE DEPENDENCE OF SAMPLING INSPECTION PLANS UPON 
POPULATION DISTRIBUTIONS 


By ALEXANDER M. Moop 


University of Texas’ 


1. Introduction. The foundations of the science of quality control and 
quality determination have been laid by W. A. Shewhart [1, 2]. His ideas per- 
vade what follows, but they are too well known to require discussion here. There 
is, however, one that should be specifically mentioned, that of statistically con- 
trolled production, because it provides the justification for the basic assumption 
of this paper: When production 1s statistically controlled, there exists a probability, 
P(N, X), that a lot of size N will contain X defective items. Shewhart has given 
a complete discussion of assumptions of this nature. 

Sampling inspection of lots may take one of two courses: 

(a) Item inspection, in which a lot is accepted or completely inspected on the 

basis of one or more samples drawn from the lot. 

(b) Lot inspection, in which a lot is accepted or rejected on the basis of one 

or more samples drawn from the lot. 
The former has been extensively studied by Dodge and Romig [3, 4, 5]; the latter 
has received little attention, but some of the basic ideas of Dodge and Romig are 
applicable to this case also. 

In this paper the approach to the general problem of lot inspection will be 
different from that of Dodge and Romig in one important respect: The role of 
the population distribution function will be emphasized, whereas they have 
directed their attention to methods which require no knowledge of the popula- 
tion distribution. Their techniques are particularly valuable when a prob- 
ability distribution does not exist, that is, when production is not statistically 
controlled. The interest here will be in the inspection of lots which may be 
regarded as having been drawn from a statistical population. After the first 
sample from the first lot has been drawn, something is known of the distribution 
of that population, and as the inspection proceeds a great body of knowledge 
may be accumulated. Here, if ever, is a real opportunity to explore and to use 
a population distribution. The very nature of inspection supplies a continuous 
flow of information about it. To neglect this information would be wasteful 
indeed. 

It is, therefore, the object of this paper to point the way to more efficient in- 
spection procedures for situations in which production is statistically controlled. 
The inspection procedure will be considered to be an inferential process—on 
the basis of one or more samples, and with whatever information is available 
about the parent distribution, an inference will be made regarding the quality 


1On leave to the War Department. 
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is equal to a power of a mod p. For example, 3 is a primitive root mod 7, for 
3° =1(7),3' = 3(7), 3° = 2(7), 3° = 6(7), 3* = 4(7), 3° = 5(7). 

For any number a we must have a?’ = 1 (p). We will prove this equation 
for primitive roots only, since we do not need the general case. Let a be a primi- 
tive root and assume that 


a?" = b =a‘ (p), witha <p-—1. 

Then we would have 
a’ ** = a” = 1 (p), with p’ <p —1. 
Hence we can obtain at most p — 2 different remainders a°’a’, ---, a” anda 


would not be a primitive root. 
We now form 






.. . or. 
_ yn 1 + "ee ow p om ti + - 
L 1+% +e" ne aa p—-1l+a™ ; (i = 0,1, ---,p — 2) 






—2-+4 —2+4 244 
ga? ** 1 + a” +4 -+ + p—1l+a? 2 


Exactly as in the case of the L; of Theorem 1 it can be shown that Z; is orthog- 
onal to L;if i # j. Fork < p — 1 the k-th row of L, equals the (k — 1)st 
row of L;,, and since a?’ = 1 (p) the last row of Zi: equals the first row of L;. 
Hence Z;,; is obtained from L; by a cyclical permutation of the (p — 1) last 
rows. It is then only necessary to construct the first square. The others can 
be obtained by a cyclic permutation of the (p — 1) last rows. We shall exem- 


plify this by constructing a set of 6 seven-sided orthogonal squares. 


Ty Tz L; 


ew oF NH © OD 





5 





0123456 0 ls 
4560123 5 
5601234 1 
1234560 3 
3456012 2 
2345601 6 
6012345 4 
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In the theory of numbers it is shown that a primitive root exists for every 
prime number. If p is not too large a primitive root can easily be found by trial 
anderror. We give a list of primitive roots for all primes under 30: 


Prime number Primitive root 

3 2 

5 2 

7 3 
11 2 
13 2 
17 3 
19 2 
23 5 
29 2 


In computing the first row of the first square it is not necessary to actually com- 
pute all powers of the primitive root. We can take advantage of the fact that a 
congruence may be multiplied by a number. Thus, for instance, for the first 
row of the 11-sided square we have 2° =1(11) 2)=2(11) 2?=4(11) 2= 


8 (11) 2*=5(11) 2°=25 =10(11) but 10 = —1 (11), hence we have 
without further computation 2° = -—2 =9 (11) 2° = —-4=7(11) 2= 
-§ = 3 (11) 2 = -5 = 6 (11). 


3. Complete sets of m-sided orthogonal Latin squares, where m is the power 
of a prime. 

We have seen that we can always construct m — 1 orthogonal Latin squares 
if m is a prime number. We shall show how to construct m — 1 orthogonal 
Latin squares if m is the power of a prime. However, if we need only a Graeco- 
Latin square of side m and if m is odd, then we can use the following theorem: 

THEOREM 2: If m is odd, then the squares 


0 1 es m—1 

1 1+1 . m—-1+1 

2 1+2 oes m—1+2 
lL, = ‘ ‘ cs ‘ 


m—-1l11i+m-—-1l+++m-1l+m-i1 


0 1 ce m—1 
2 1+2 oe 4 m—-1+2 
2.2 1 + 2.2 ce m—1+ 2.2 
L. = ‘ ‘ ves ‘ 


2im—1) 1+2(m—1) -++m—1+2(m—1) 
are orthogonal. 
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The proof is similar to the proof of Theorem 1. We have to use the fact that 
2 is prime to m. 

We shall now prove the following statement: For every remainder a # O(p) 
there exists another remainder a’ such thata-a' = 1(p). 

Proor: We form the sequence a, a’, ---,a",---. Since there is only a finite 
number of remainders, there must exist 2 values 2 and j such that 


i 


a’ = a’(p) 


Let: >j. Then since ais prime to p we may divide by a’. Putting i — j = d, 
we obtain 


a‘? = q’ = 1(p). 






















Hence we may take a‘ = a*" and our statement is proved. Thus we see that 
the system of remainders mod p with respect to addition as well as with respect 
to multiplication if 0 is excluded satisfies the following postulates: 

(1) For every pair of elements A, B there is defined a product A-B within the 
system such that for any 3 elements A, B and C 


A-(B-C) = (A-B)-C (associative law) 


The “multiplication” may be any sort of composition. For example, either 
addition or multiplication of remainder classes. 
(2) There exists a unit element 1 such that 


A-l1=1-A =A. 





(3) For every A in the system there exists an element A~ such that 


A-At?=A™"-A =1. 





The unit element will be 0 if we consider the remainder classes with addition 
as composition. It will be 1 if multiplication is the composition. The inverse 
of a is —a for the additive system, a’ for the multiplicative system. 

A system satisfying (1), (2) and (3) is called a group. The property A-B = 
B-A is usually not postulated. If a group fulfills this condition, then it is called 
a commutative group or an Abelian group. A group can be defined by its gen- 
erating elements. For example, let G be generated by the elements P, Q with 
the relations P? = 1, Q° = land PQ = Q’P. We then obtain the elements of G 
as 1, P,Q, PQ, PQ’, Q’. The rules for the multiplication can be written down in 
the form of a table: 


P Q PQ PQ QQ’ 

] PQ Q Q PA 
PQ’ Q’ P PQ 

Q° PQ’ 1 Q P 

Q P Q° 1 PQ 
PQ 1 PQ’ P Q 








w 


esdorvr 


we 
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By inspection one can see that taking the elements of our group as symbols 
the multiplication table forms a Latin square. For instance, if we identify P 
with 2, Q with 3, etc. we obtain from the table above 


1 2 3 4 5 6 

2 1 4 3 6 5 

3 5 6 2 4 1 

4 6 5 1 3 2 

5 3 2 6 1 4 

6 4 1 5 2 3 
We shall prove that this is generally true. Let the group G consist of the 
elements Ai,---,Am. We write down the multiplication table of the group: 

Ai hi we Am 


Ag AsA2 : + 3 AcA m 


An AmAz - = = A mA m 
Suppose this is not a Latin square. Then an element will occur twice in at least 
one row or at least one column, that is, we should have either 


A;A; = A;jAk, fori ~ k 
or Aj;A; = AxA;, for 7 # k. 


Multiplying the first equation by Aj’ on the left, we obtain A; = Az. Hence 
1 = k. Similarly in the second case 7 = k, contrary to our assumption. 

Two groups G and G are called isomorphic if we can map G into G in such a 
way that the mapping is not disturbed by multiplication. That is, if A is mapped 
on A and B on B and if AB = C and AB = C, then C must be mapped on C. 
Such a mapping is called an isomorphism. If G = G then the mapping is called 
an automorphism. For instance, if we consider the remainder system mod m 
with addition as composition and j is any remainder, then the mapping @ = ja 
is an automorphism. For if 


a+ b= c(m) 
then aj + bj = cj(m) 


Some automorphisms establish a 1-to-1 correspondence between the elements 
of G. For instance, in the above example if j is prime to m the correspondence 
is bi-unique (that is only one element is mapped on any element of G@) because if 

oe 
aqj = bj(m), 
and j is prime to m then 


a = b(m). 
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If 7 is not prime to m, the mapping would not be unique although it would still 
be an automorphism. From now on we shall consider only automorphisms 
which establish a 1-to-1 correspondence between the elements of G. 

Let S be such an automorphism and denote by A* the element into which A 
is mapped under the automorphism S. We put (A‘)* = rr, way = gr’, ete. 
Wealso put A® = A. Weshall prove the following theorem: 


Let S be an automorphism such that S, S’, --- , S* map no element into itself 
except the element 1. Then the Latin squares 
:. , ins 
Az A? A, ++ + At Am 
L; = ; ; = ‘ ( = 0,1, ---,@) 


As. AS As a Sdn 
are orthogonal. 

Proor: Assume that L; is not orthogonal to L;. Let L;; be the resulting 
square if L; is superimposed on L;. Then for some k and / and some r and s 
we should have the same pair of elements in the kth row and /th column and 
in the rth row and sth column. That is, we should have 


(1) Ay A; = Ay'A, 

(2) Aj’ A, = Ay'A,. 

By taking the inverse elements it follows from (2) that 
(3) Ai'Ay” = Ay'A;” . 


Multiplying (1) and (3) we obtain 
Ap A,” = A*'A;”. 
Multiplying by A7** to the left, and by Aj’ to the right, we obtain 
A;“Ap = Az" ai. 
Since S' and S’ are automorphisms we have 
(A;"Ai)® = (A7*AL)”. 
Assuming 7 > j, then ; 
[(Ar*Aa) YP" = (Aza). 
Because of i < g,j < q we havei— j <q. By assumption therefore S*’ can 
can leave only 1 fixed. Therefore 


(A7'A,)” = 
Hence A;'Ax 
Ay 


i | 
— — 
: 
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But then also 
A; = A, . 


Therefore r = k and! = s. Hence the two compartments of L,; cannot be 
different and our statement is proved. 

We see therefore that we can construct a set of g + 1 orthogonal Latin squares 
if we can find a group G and an automorphism S of G such that 


S, S’*,---, S* 
maps no element into itself except the unit element. If gq = m — Zand we write 
1 A, Ad-:- A; 


Az AzA2 + +++ ADAS 
a i « wow re 
then the (k — 1)st row of Li: equals the k-th row of LZ; and all squares may be 


obtained from Ly by a cyclical permutation of the rows. 
We shall now consider commutative groups of prime power order p” defined 


by the relations 
P?=PZ =... = Pi =i, P:P; = P;P;. 
The elements of this group G have the form 
Pit... Pts G1, °*+,é: =0,1,°---,p—1. 


We call P, --- P, a basis of G. We can easily change the basis. For instance 
if P,,---, P, is a basis then also P,, PiP:,---, Pi:P, is a basis. For every 
expression we have 


Pt a P* nn Pj'~*2"*'—*=(P,P,)* i (PiP,)™, 


since G is commutative. Such a change in the basis defines an automorphism of 
G at the same time. For let Pi, ---, P, be the new basis. We can map 


Pt «++ Pe 
into Pi! --. Pie, 


On the other hand an automorphism is determined if we know on what elements 
the basis elements are mapped. 

It can be shown that every such group admits an automorphism S such that 
S, S’, ---, S””” leaves no element fixed except 1. Hence we can always con- 
struct a set of p” — 1 orthogonal squares of side p” if p is a prime. We shall 
give these automorphisms explicitly for the groups of order 8, 9, 16, 25 and 27. 

As an example let us construct 7 orthogonal 8 sided squares. We shall use 
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the group G generated by P, Q, R where P=Q =R 
morphism S where 


1. Weuse the auto- 


P’* =Q QO =R R’ = PQ. 
We then have P’ = Q, P” = R, P* = PQ, P™ = P*Q' = QR, P* = 
Q°R’ = PQR, P* = P'Q'R' = QRPQ = PR, P" = P*R' = QPQ = P. 
If we write the elements in the order 1, P, P’, P*, .-. | P** we obtain the fol- 
lowing multiplication table: 


1 P Q RK PQ QR PQR PR 

1 PQ PR Q PQR QR R 

Q P® 1 QR r R PR PQR 

R PR QR 1 PQR Q PQ P 
PQ Q P PQR 1 PR R QR 
QR PQR R Q PR 1 P PQ 
PQR QR PR PQ R P 1 Q 
PR R PQR P QR PQ Q 1 


The other squares are then obtained by a cyclical permutation of the rows of 
this square. We now write 2 instead of P, 3 instead of Q, etc. and obtain: 


12345678 12345678 
21583764 35162487 
35162487 48617352 
— 48617352 as 53271846 
53271846 67438125 
67438125 76854213 
76854213 84726531 
84726531 21583764 


and so forth. 

For the group of order 5, generated by P, Q with the relations P* = Q* = 1 
the automorphism P* = Q, Q° = PQ has the property that S, S’, --- , S’ maps 
no element into itself. For the group of order 16 we have 4 basis elements 
P,Q, R, T with P? = Q’ = R° = T’ = 1and Scan be given by P’ = Q, Q* = R, 
R= T,T = PT. 

For the group of order 25 we have two basis elements P, Q with P® = Q° = 1 
and the automorphism is given by P* = Q, Q° = P*Q. 

The group of order 27 is generated by P, Q, R and the defining relations are 
P* = Q = R® =1. The automorphism is given by P* = Q, Q* = R, R* = P’Q. 

We have now shown 

THEOREM 3: Let m = p” and let G be the commutative group generated by Pi, 
--+ , P, which satisfy the relations PP = P? =--- = P2 = 1. Let S be an 
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automorphism such that P*” ~ Pif0 <i<m—2,P #1. Then the Latin 
squares 


1 P Pp... _ 
 . Pp ‘ sa eh a et lll 

L= Pp PP soe ee PUM pemrs 6G = 0,1,---,m — 2), 
peat petty : : : : pem-3ts pent 


are orthogonal. L; is obtained from L;-, by a cyclical permutation of its last m — 1 
rows. 


4. Remarks on the largest number of m-sided orthogonal Latin squares, for 
arbitrary m. 

The general problem can be formulated as follows: Given a number m, what 
is the greatest number of orthogonal m-sided squares. 

It is clear that this number cannot be larger than m — 1. For we can by 
renaming the numbers of the squares always transform them without changing 
their orthogonality in such a way that the first row is 1, 2,--- m. Hence the 
pairs 1 1, 2 2,---, mm, occur for any two squares in the first row of the re- 
sulting square. Hence the numbers in the first column and second row of the 
squares must be different from 1 and different from each other. But we have 
only the numbers 2, --- , m at our disposal and these are only m —: 1 numbers. 

We have shown that if m is the power of a prime m — 1 orthogonal squares 
can always be constructed by the use of groups. Hence our problem is solved if 
m is the power of a prime. Very little is known about numbers which are not 
prime powers. Tarry (Compte Rendu, 1900) has shown that no 6 sided Graeco- 
Latin square exists. It is conjectured but not yet proved that no Graeco-Latin 
square of side 4n + 2 exists. We shall, however, show the following: If.m = 
pi --+ pit where p; is a prime number (p; ¥ p; for? ¥ 7) and ifr = minimum 
p;* — 1 then r orthogonal Latin squares can be constructed from commutative 
groups of order m. 

We take the group G of order m generated by e; elements of order p, , é2 ele- 
ments of order pz, --- , én elements of order p,. We determine the automorph- 
isms 7'; of the subgroup generated by the elements of order p; such that 7; , 
%,-, 2 leave no element of order pi fixed. We define then an auto- 
morphism 7; of G generated by changing the basis elements of order p; in the 
same way as they are changed by 7’; and leaving the other basis elements fixed. 
Then 


T =T,T.---T, 





<P RUE aT BIC 


sas += SURAT 


¥ et 


Sa Ole 
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| P 


17 
10 
15 
20 
13 

6 
19 
16 

9 
18 

7 
12 


5 
14 
11 


are orthogonal. 


Q PQ|PR 


3 


15 


2 8 12 


‘i each cycle. 


4 


13 . 


10 
7 
16 


Thus in our present case we have: 
1| P, Q, PQ | PR, QR’, PQR‘, PR’, QR, PQR’, PR*, QR’, PQR, PR’, QR’, 
PQR’ | R, R’, R*, R’ | 


QR? 
6 


10 
18 
14 


16 
19 
5 
4 
20 
11 
13 
1 
8 
7 
17 
2 


i2 
15 
9 
3 


1; R* = 1. 


PQRS 
7 


15 
11 
19 


ro 8B ano 


18 


4 
13 
16 
10 


An AnAz- 
TABLE I 
PR? QR PQR? PR‘ QR? 
8 9 10 ll 12 
20 13 6 19 16 
16 $17 14 7 20 
12 5 18 15 8 
19 10 12 1 7 
420 ll 13 #1 
6 2 17 12 
17 7 3 18 13 
7 18 8 4 19 
3 8 19 9 2 
18 4 9 20 10 
13 19 2 10 17 
15 14 2 3 ll 
1 16 15 17 4 
10 1 5 16 18 
9 11 1 6 5 
ll 6 16 2 15 
2 12 7 5 3 
14 3 13 8 6 
5 15 4 14 9 


Tt 


Ae: 
Az As - 
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PQR_ PR? QR‘ 
13 14 15 
9 18 7 
5 10 19 
17 6 ll 
6 2 4 
8 7 #17 
1 9 8 
15 1 10 
14 16 «1 
20 #15 #5 
3 17 16 
11 4 18 
18 12 2 
12 19 13 
2 13 20 
19 3 14 
10 8 3 
1 ill 9 
4 5 12 
7 2 6 


° Aw 
7 Az Am 


, ad. 


(a 


is an automorphism whose first r — 1 powers leave no element fixed. 
r Latin squares 


PQR: | R 
16 17 
12 5 

8 9 
20 «13 
15 14 

2 12 
18 4 

. & 
11 6 

1 16 

6 2 

5 15 
19 10 

3 8 
14 3 
17 7 

7 18 

4 20 
10 1 
13 19 





Hence the 


0,1, --: 


19 
17 
1 


19 


12 
10 


1 
17 
20 
18 


We shall exemplify this by constructing 3 orthogonal squares of side 
We use the group G generated by P, Q, R with the defining relations: P*? = 
The automorphisms are given by: P7™: = Q, Q™: = PQ, 
R77: = R, P?™: = P, Q™: = Q, R?2 =R. Hence T = 7,72 is given by: P” = Q, 
Q’ = TQ,R™ = R’. Therefore we have: P? = Q, P™ = PQ, P”™ = P7Q’ = P, 
(PR)” = QR’, (PR)” = PQR‘, (PR)” = PR’, (PR)™ = QR, (PR)™ = PQR’, 
(PR)™ = PR‘, (PR)” = QR’, (PR) = PQR, (PR)” = PR’, (PR) = QR’, 
(PR)”* = PQR’, (PR)”” = PR, R’ = R’,R™ = R‘, R™ = R’, R™ = R. 

We need only construct one key square if we write down the elements in the 
way in which they are written above. 


~* — I) 


20. 


Then we have only to mark the end of 
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The vertical lines mark the cycles in which the elements are permuted by the 
automorphisms. We then write down the key square in Table I. From this 
key square we can easily obtain a set of 3 orthogonal squares by permuting the 


TABLE. II 
11 22 3,3 4,4 #42.45,5 66 #497,7 &«2.88 9,9 10,10 
1 2 3 4 5 6 7 x 9 10 
2,3 1,4 4,1 38,2 17,18 10,18 15,11 20,16 13,17 6,14 
4 3 2 1 9 14 19 12 5 18 
3,4 4,3 1,2 2,1 13,9 18,14 11,19 16,12 17,5 14,18 
2 1 4 3 17 10 15 20 13 6 
4,2 3,1 2,4 1«°1,3 9,17 14,10 19,15 12,20 5,13 18,6 
3 4 1 2 13 18 ll 16 17 14 
5,6 17,10 13,18 9,14 18,16 16,19 3,5 19,4 10,20 12,11 
7 15 11 19 3 5 20 6 2 17 
6,7 10,15 18,11 14,19 16,3 19,5 5,20 4,6 20,2 11,17 
8 20 16 12 19 4 6 17 7 3 
7,8 15,20 11,16 19,12 3,19 5,4 2,6 6,17 2,7 17,3 
9 13 17 5 10 20 2 7 18 8 
8,9 20,13 16,17 12,5 19,10 4,2 6,2 17,7 7,18 3,8 
10 6 14 18 12 11 17 3 8 19 
9,10 13,6 7,14 5,18 10,12 20,11 2,17 7,3 18,8 8,19 
11 19 t 15 1 13 12 18 4 9 
10,11 6,19 14,7 18,15 12,1 11,13 17,12 3,18 84 19,9 
12 16 20 8 7 1 14 13 19 2 
11,12 19,16 7,200 15,8 1,7 13,1 12,14 18,13 4,19 9,2 
13 9 5 17 6 8 1 15 14 20 
12,13 16,9 2,5 8,17 7,6 1,8 14,1 18,15 19,14 2,20 
14 18 10 6 20 7 9 1 16 15 
13,14 9,18 5,10 17,6 6,20 8,7 1,9 15,1 14,16 20,15 
15 7 19 11 4 17 s 10 1 5 " 
14,15 18,7 10,19 6,11 20,4 7,17 9,8 1,10 16,1 15,5 
16 12 8 20 15 2 18 9 11 1 
15,16 7,12 19,8 11,20 4,15 17,2 818 10,9 41,11. 5,1 
5 17 13 9 18 16 3 19 10 12 
16,5 12,17 8,13 20,9 15,18 2,16 18,3 9,19 11,10 1,12 
6 10 18 14 16 19 5 4 20 11 
17,18 5,14 9,6 13,10 14,8 12,15 4,13 11,2 6,12 16,7 
19 11 15 7 2 9 16 14 3 13 
18,19 14,11 6,15 10,7 8,2 15,9 18,16 2,14 12,3 7,13 
20 8 12 16 11 3 10 5 15 4 
19,20 11,8 15,12 7,16 2,11 9,3 16,10 14,5 3,15 13,4 
17 5 9 13 14 12 4 il 6 16 
20,17 8,5 ° 12,9 16,13 11,14 3,12 10,4 5,11 15,6 4,16 
18 14 6 10 8 15 13 2 12 7 


rows within the cycles indicated. Because of space difficulties we give only the 
first half of the square in Table IT. 

One might hope that with other groups more than r = minimum pi‘ — 1 
orthogonal squares might be obtained. It has been shown however that using 
any group and its automorphisms at most r orthogonal squares can be obtained. 
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A more general method based on groups is given in a recent paper (H. B. 
Mann, “The construction of sets of orthogonal Latin squares,” Annals of Math. 
Stat., Vol. 13 (1942)). It can be shown that also with this more general method 


no 4n + 2 sided Graeco-Latin square can be constructed. 
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ON THE DEPENDENCE OF SAMPLING INSPECTION PLANS UPON 
POPULATION DISTRIBUTIONS 


By ALEXANDER M. Moop 
University of Texas’ 


1. Introduction. The foundations of the science of quality control and 
quality determination have been laid by W. A. Shewhart [1, 2]. His ideas per- 
vade what follows, but they are too well known to require discussion here. There 
is, however, one that should be specifically mentioned, that of statistically con- 
trolled production, because it provides the justification for the basic assumption 
of this paper: When production is statistically controlled, there exists a probability, 
P(N, X), that a lot of size N will contain X defective items. Shewhart has given 
a complete discussion of assumptions of this nature. 

Sampling inspection of lots may take one of two courses: 

(a) Item inspection, in which a lot is accepted or completely inspected on the 

basis of one or more samples drawn from the lot. 

(b) Lot inspection, in which a lot is accepted or rejected on the basis of one 

or more samples drawn from the lot. 
The former has been extensively studied by Dodge and Romig (3, 4, 5]; the latter 
has received little attention, but some of the basic ideas of Dodge and Romig are 
applicable to this case also. 

In this paper the approach to the general problem of lot inspection will be 
different from that of Dodge and Romig in one important respect: The role of 
the population distribution function will be emphasized, whereas they have 
directed their attention to methods which require no knowledge of the popula- 
tion distribution. Their techniques are particularly valuable when a prob- 
ability distribution does not exist, that is, when production is not statistically 
controlled. The interest here will be in the inspection of lots which may be 
regarded as having been drawn from a statistical population. After the first 
sample from the first lot has been drawn, something is known of the distribution 
of that population, and as the inspection proceeds a great body of knowledge 
may be accumulated. Here, if ever, is a real opportunity to explore and to use 
a population distribution. The very nature of inspection supplies a continuous 
flow of information about it. To neglect this information would be wasteful 
indeed. 

It is, therefore, the object of this paper to point the way to more efficient in- 
spection procedures for situations in which production is statistically controlled. 
The inspection procedure will be considered to be an inferential process—on 
the basis of one or more samples, and with whatever information is available 
about the parent distribution, an inference wii! be made regarding the quality 


1On leave to the War Department. 
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of those items which have not been examined. A distinction is made between 
the original lot and what remains of the lot after samples have been drawn. 
The latter is the appropriate subject of the inference, inasmuch as the quality 
of the sample is exactly known. The importance of this distinction will become 
clear in the third section of the paper. 

The subject is, unhappily, very briefly developed. The paper contains a few 
fundamental results and some suggested proceedures that may be used to obtain 
results of more immediate practical value. Time and facilities were not avail- 
able for preparation of specific sampling plans. 











2. Notation and formulae. The conventional notations P(u), P(u | v), P(u, v) 
will be used to denote the probability of wu, of u given v, of u and », respectively. 
A lot will contain N items of which X are defective. A lot from which one 
sample has been drawn will be called an “‘z-lot;’”’ after 7 samples have been drawn 
it will be referred to as an “x‘-lot.”” The number of items in the ‘-th sample 
will be n; of which x; are defective, except that the subscript will often be omitted 
when i = 1. The number of items in an z*-lot will be: 







k 
N=N—- Dn 





of which 






Xi 











are defective. 
The probability of x; for a given x*’-lot is: 





(1) P(x; | Xin) = (™) Xee (Nia — Xs)" wie? ’ 





where (*) is the binomial coefficient, and 





u” = u(u — 1)(u — 2) --- (u—v +1). 


Under this conditional distribution, the :2-th factorial moment of z; is: 
(2) E(x | Xin) = nf? XPV, 

and the m-th factorial moment of ‘X; is: 

(3) E(X§” | Xin) = MIPX/N. 

Repeated application of (3) to (2) results in: 

(4) E(ri™) = nSME'X™)/N™. 

In similar fashion it may be shown that: 


k k 
(5) a(I xn?) = Il nr) nr". 
i=l 


t=] 
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3. Single sampling. Consider a population of lots of fixed size N such that 
the probability that a lot will contain X defective items is P(X). If x is the 
number of defective items in a sample of size n drawn from one of these lots, 
then the joint probability of z and X is: 


(z) es (n—z) 
6) P(2, X) = (") a PRD, 


The fundamental result of this paper is: 

THEOREM 1. The correlation between the number of defective items in the sample, 
x, and the number of defective items in the remainder of the lot, X; = X — z, is 
positive, zero, or negative according as the variance, ox , of X is greater than, equal 
to, or less than A — A°/N, where A represents the expected value of X. 


To prove this statement, one need merely compute the covariance between 
x and X,: 


(7) Tox, 00x, = 2, 2(X — 2)P(x, X) — E(x)(A — E(z)). 


2% 


Summing first on x with the aid of (2): 


n x n® - n 
Tex, 02 0x, = 2X (; ~ga i” -2 x)P(x) — E(z)(A — E(z)) 


which may be reduced to: 


_mN—n)| 2 _ -4)| 
(8) Tex, 02 Ox, = NN — 1) E ( WV 
by employing the definitions of A and ox together with the relation, 
E(x) = nA/N, 


which follows from (4) on putting m = 1. 

The fact that A — A’/N is the variance of a binomial distribution with mean 
A and range N, suggests: 

THEOREM 2. If X has the binomial distribution, 


@) Px) = (X)p*a - ay", 


then x and X — x are independently distributed. 


This statement is readily verified by substituting (9) in (6), and X; for 
X — x; a rearrangement of factors then gives: 


Poe, x) = [(")era - ay ][(" x "pha - yt. 


It is clear that additional samples drawn from such lots will have the same 
property. Thus, sampling of lots drawn from a binomial population wili pro- 
vide no basis whatsoever for inferences concerning the remainder of the lot. 
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The question naturally arises as to whether distributions P(X) exist for which 
Tzx, = +1. 
THEOREM 3. If 


P(X) = 1, X =A, A#0orN 


(10) 
= 0, X #A, 
then rzx, = — 1;7f 
P(X) = p, xX =0 
(11) =l1-p, X=WN 
= 0, X =1,2,---,N —1, 


then rzx, = 1. These are the only distributions which lead to these values of rzx, . 
It is first necessary to compute 


(2) 2 
(12) b= Malo + Voa(4-§$)| 





~ N® 1 N 


(N dea n)” n A 
(13) ey es -* * oer B 


by means of (2), (3), and (4). These, together with (8), may then be used to 
reduce the condition, 7; x, = 1, to the following condition on P(X): either 


(14) > (X — A)’ P(X) = 0, 


or 


(15) » X(N — X)P(X) = 0, 


whence the theorem follows at once. The distributions defined by (10) and 
(11) will be referred to hereafter as P_(X) and P(X) respectively. 

THEOREM 4. The correlation, r:x , between x and X is positive unless X is dis- 
tributed by P_(X) in which case it is zero. 

Computing the covariance by means of (2), (3), and (4), one finds that 


(16) Tex0z20x = nox/N. 


The reason for so carefully distinguishing between the z-lot and the original 
lot is now apparent. While the number of defective items in the sample is al- 
ways positively correlated with the number of defective items in the original lot 
(Theorem 4), it may be negatively correlated with the number of defective items 
in the z-lot (Theorem 1). The normal practice is to reject (or completely in- 
spect) the z-lot if the sample has an excessive number of defectives, but when 
the distribution is sharper than a binomial distribution (¢x <A — A’/N) just 
the reverse should be done. It is assumed, of course, that defective items would 
be removed from the sample during its inspection when the inspection was non- 
destructive. 
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It is clear that the basic rationale of a sampling inspection plan depends on 
the condition of Theorem 1. Having chosen a sample size n and an acceptance 
number a (defined by Dodge and Romig [1]), an z-lot would be 


Accepted when x < a ifox > A — A’/N 


se Rejected whenz >a _ if ox > A — A’/N 
u 


Accepted when z > a if oe < A — A?/N 
Rejected whenz <a _ifox < A — A’/N. 


Thus, it is essential that the first two moments of the population distribution be 
known accurately enough to determine the sign of co; — (A — A’/N) before an 
efficient inspection plan can be devised. 


4. Multiple sampling. In this section are given similar criteria for guidance 
in formulating more elaborate sampling plans. The actual computations are 
elementary and will be omitted. 


THEoREM 5. The mean and variance of the number of defective items in a sample 
drawn from an x'-lot are: 


(17) E(z,) = n;A/N 


n? A’ 
(18) a, = N® jo + + ——— —" (4 — “)|. 


Turorem 6. The mean and variance of the number of defective items in an 
x'-lot are: 


(19) E(X;) = N.A/N 
I? N -N; A? 
(20) = NG [ot +N (a - ZY]. 


THEOREM 7. The correlation between the numbers of defective items in the 1-th 
and j-th samples is: 


1 nn; A’ 


TaroreM 8. The correlation between the numbers of defective items in the i-th 
sample and the x’-lot is given by: 


te 2 
(22) reersoner, = MEE Do + T 7 i(4 - 4), t>j 





N@ N;—-1 N 
iN; A’ ‘ ; 


Thus, the correlation is always positive if the sample is part of the lot even when 
X has the distribution P_(X), except only the case covered by Theorem 4 when 
j = 0. The correlations (21) and (23) will be positive or negative in accordance 






tenance et eA 


ea eee rRNA eR Sars 
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420 ALEXANDER M. MOOD 


with the condition of Theorem 1. The extreme values of all these correlations 
are again given by the distributions P_(X) and P,(X) defined in Theorem 3. 
When P(X) = P(X), they all become plus one; when P(X) = P_(X), they 
become: 


(24) Taz; = — Vnin;/(N — ni)(N — ™), 
(25) Tox; * Vni(N — N;)/NX(N — mi), 
(26) = — VniNj/(N — n)(N — Nj), 


For i = j = 1, this last expression becomes minus one in accordance with Theo- 
rem 3. 


5. Formulation of inspection plans. In practice, the formulation of specific 
sampling inspection plans would naturally begin with the examination of a 
preliminary sample (or samples) in order to estimate the first two moments of 
the population distribution. It would then be convenient to have some simple 
standard functional form which could be fitted to the distribution by means of 
these first two moments. Such a standard form must obviously contain two 
arbitrary parameters and should represent a discrete distribution with range N. 
The simplest function known to the author which satisfies these conditions is: 


(27) P,(X) a (X) c* pr a 4 py. 


But it will be seen that this distribution is always sharper than the binomial 
distribution with the same range and mean. Hence a second form is suggested, 


(28) P(X) = (X) (C+ X)™(D + N — X)8-P71C + D+N4+1)™, 


which, it turns out, is always flatter than the binomial distribution with the 
same range and mean. It is proposed that these two functions be used as 
standard forms in the belief that the simplicity of their functional form is a 
convenience which outweighs the inconvenience of having to study two separate 
functions. 

The factorial moments of these distributions are: 


(29) > 7 P,(X) — N™ cme + bD)™ 
0 


(30) > X™ P(X) = NC + m)"/(C+D+m+1)™ 
0 


The variances are: 


N , _ NCDC+D—N) 
G1) Le (X — A PX) = oppo EDD 


: ry) -NC+VO+DN+C4D42) 
($2) BX — APO) = “ep ae + D +3) 
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Examination of the expression, ox — (A — A’/N), reveals that for P;(X) it is 
always negative, while for P2(X) it is always positive. Both Pi(X) and P2(X) 
approach the binomial distribution when C and D become large in a fixed ratio. 
P,(X) becomes P_(X) when C = A and D = N — A. AsC and D become 
larger, the distribution becomes flatter until in the limit it is the binomial dis- 
tribution. P(X) becomes the rectangular distribution, P(X) = 1/(N + 1), 
when C = D = 0, and becomes sharper as C and D increase. 

The two distribution functions will not serve to approximate U-shaped dis- 
tributions, and P(X) has the disadvantage that C and D must be integers when 
they are less than N if negative probabilities are to be avoided, but since C + D 
will be greater than or equal to N in any case, and much greater than N in most 
cases, this is not a serious limitation. The two functions are reproduced when 
the marginal distributions for samples are computed: 


Pim) = — Pla, +++ , | X)PA(X) 
(33) . 
= (™) ce per 160 + Db)" 
P2(xi) = i P(ami, +++ , 2 | X)P2(X) 
(34) 


= (™) C+ 29D +m = IC + D+ mE D™, 


This is a most valuable property for two reasons. In the first place, it will 
appreciably facilitate the tedious machine calculations necessary in the work of 
providing specific optimum sampling plans. In the second place, it will simplify 
the study of the population distribution of lots by means of samples from those 
lots. 

These two distributions should, then, provide an adequate basis for the 
formulation of sampling inspection plans in most circumstances. 


6. Efficiency of sampling inspection. There are two aspects to the efficiency 
of an item inspection plan: the inspection aspect, which would be measured by 
the proportion of defective items eliminated, and the sampling aspect, which 
would be measured by the difference between the proportions of defective and 
good items examined. These two measures are primarily functions of the 
amount of inspection; the former will be large when the amount of inspection is 
large, and the latter will ordinarily be large when the amount of inspection is 
small. They will not, therefore, serve as useful criteria for excellence. The 
measure to be used here is: 


(35) E=R;— Re 


where R; is the proportion of defective items examined, and Fg is tue proportion 
of good items examined. It will be zero when the, inspection plan is not at all 
selective, and will be 100% when all of the defective items and none of the good 
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items are examined. It measures both aspects mentioned above, but has the 
disadvantage that it emphasizes one or the other for different amounts of in- 
spection. It is not, therefore, a particularly good measure of efficiency, but it is 
a good criterion. It should ordinarily be maximized. 

For single sampling with an acceptance number, a, and with a population dis- 
tribution sharper than the binomial, the number of items inspected on the 
average per lot is: 


(36) I=n+(N —n) >) P(z) 
0 
and the number of defective items inspected on the average per lot is: 


(37) B= Ee) + D(X - oP, X) 







The efficiency will be: 
(38) E = B/A — (I — B)/(N — A) 


which may be put in the form: 




















_ N(N — n) <x (4-2 a) 
- ~ A(N — A) wk — H)P@ X) 
after substituting (36) and (37). This may be further simplified to: 
N(IN-1) Sfe+1 _A | 
(40) E= A(N — A) A) % > [= n+1 Payi(2) N P,(x) ’ 


where P,,(x) is the marginal distribution of x for samples of size m. For dis- 
tributions flatter than the binomial, the limits of the summations on x would 
be a + 1 to n throughout, instead of 0 to a. 

THEOREM 9. For a fixed value of n, the acceptance number which maximizes E 
isa = E(x) when X is distributed by P,(X) or P2(X). 

The expression in the brackets of (40) becomes: 


E(x) 
(41) T+ Daal 
when (33) is substituted for P(x), and becomes: 
(42) toe. Pit 


CDtnF2" 
when (34) is substituted for P(x). This theorem is true for a wider class of dis- 
tribution functions, P(X), but is not worth pursuing too deeply because its main 
value is in the light it throws on the general nature of inspection plans. It will 
be a rare case in practice when » is fixed and ais unrestricted. Some idea of the 
manner in which EF depends on population distributions can be attained by com- 
puting it for some simple distributions, and by examination of equation (40). 


SAMPLING INSPECTION PLANS 423 


E can be 100% only when all submitted items are defective, but it will 
obviously be very near 100% when the distribution is P(X) if samples of one 
are used. However, a more reasonable maximum might be 50% which is the 
largest possible value when the distribution is rectangular (as is shown in the 
next section). As the distribution becomes sharper, the maximum efficiency 
decreases to zero when the binomial distribution is reached. As the distribution 
becomes still sharper, the efficiency increases until it again reaches 50% for the 
distribution P_(X). Thus the efficiency is limited, and, in fact, will ordinarily 
be further reduced by conditions (fixed amount of inspection, or fixed outgoing 
quality level, for example) which will not allow the unrestricted maximum 
efficiency to be used. 


7. Sampling plans for the rectangular distribution. Hxcluding the extreme 
distributions, P_(X) and P(X), the distribution which provides the simplest 
illustration of some of the ideas above is the rectangular one: 


(43) P(X) = 1/(N + 1), X = 0,1,2,---,N, 
the mean and variance of which are: 
(44) A=WN/2 


ox = N(N + 2)/12. 
The marginal distribution of zx is: 
(45) P(x) = 1/(n + 1), 
and the efficiency is: 


(46) BagN=—aa+)) 


The values of n and a which maximize this expression are: 


n=~/N+2-2 


47 ar 
- a= (VN + 2 — 8)/2 
whence 
«ite 8 Wei Ween 
Bow = 31 area)! a, 


or nearly 50% for large N. This plan eliminates almost 75% of the defective 
items and entails examination of about 25% of the good items. 50% of all 
items will be inspected. 

If the proportion of items to be inspected is fixed at 7, then the maximization 
of E is subject to the restriction: 


(49) rN =n + (N — n)(n — a)/(n + 1) 
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and results in: 

60) n= NG - 1) + VN — 7! NON FO — BN WD) 
N(il-r)-1 

or for large N, 

n= ViN/O—*) 

a= ~V/r(i — r)N. 


If the average outgoing quality (as defined by Dodge and Romig) is to be fixed 
at p (the proportion of defectives after inspection on the average), then the 
maximization of E is subject to the condition: 


_ (N — n)(a + 2)” 
(52) P= Nin + 2)® + (N — nya + 2)® 


and results in the relation: 


(51) 


(53) (N — n)(n — a) = (2+ 1)(n+ 1)(n + 2). 


When N is large relative to 1/p, the solution of these last two equations is ap- 


proximately: 
n= VN 4/4/12 -1 
P 


(54) 


a P 
afghan 


The same result would have been obtained had the amount of inspection been 
minimized subject to (52). 


8. Summary. Methods of sampling inspection in current use have been made 
independent of any population distribution that may exist. When production 
is statistically controlled, a population distribution may be postulated. In 
such circumstances it is to be expected that knowledge gained of the population 
by repeated sampling will be a valuable aid in specifying efficient sampling 
inspection techniques. This paper is a preliminary investigation of the relation 
of lot sampling inspection plans to population distributions. 

Lots are assumed to be drawn from a population such that there is a unique 
probability the lot will contain a specified number of defective items. It is 
shown that: 

1. The number of defective items in a sample from a lot is positively or nega- 
tively correlated with the number of defective items in the remainder of 
the lot according as the population distribution is ‘flatter’ than or 
“sharper” than a binomial distribution. Distributions are found for which 
this correlation is plus or minus one. 
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2. If the distribution is the binomial one, the number of defective items in 
the sample is distributed independently of the number of defective items 
in the remainder of the lot. Thus a sample can furnish no basis for an 
inference concerning the remainder of the lot. 

3. The correlation between the number of defective items in the sample and 
the number of defective items in the original lot is positive. 

These results are generalized for repeated sampling of one lot. 

There is discussed a standard functional form which can ordinarily be fitted 
to population distribution functions for purposes of constructing sampling 
inspection plans. 

It is shown, for a class of distribution functions, that a single sampling plan for 
nondestructive inspection will be most efficient in a certain sense when the 
acceptance number is equal to the expected number of defective items in the 
sample. 

Optimum single sampling plans for nondestructive inspection of lots with a 
rectangular probability distribution are determined for restricted amount of 
inspection and for restricted average outgoing quality. 
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ON CARD MATCHING 
By T. W, ANDERSON 
Princeton University 


1. Introduction. Several authors have discussed the probability of obtaining 
a given number of matched pairs of cards under conditions of random pairing of 
two decks of arbitrary composition. The exact expression for this probability 
(equation (6)) is ordinarily too complicated for use in computing significance 
levels. This is especially true for certain practical applications. For example, 
in a square two-way contingency table in which the categories corresponding to 
rows are identical with those for columns, the sum of the entries in the diagonal 
cells has this distribution. Intuitively one would suspect that the distribution 
is asymptotically normal, as suggested by several authors. In the following 
section proof is given that the number of matched cards is asymptotically 
normally distributed when the total number of cards in each of the two decks 
apvroaches infinity with the proportion of cards in each suit of each deck remain- 
ing fixed. The form of the limiting distribution can then be used in computing 
approximate significance levels. 

A problem of some interest to psychologists is that of determining whether an 
individual has matched two series of items better than could have been done “by 
chance”’; for instance, whether a graphologist has matched personality descrip- 
tions with specimens of handwriting better than by chance. The problem can 
also be phrased in terms of card matching under random pairing of two identical 
decks each of a given number of different cards. This will be recognized as a 
classical problem of probability theory: Let tickets numbered from 1 to n be 
placed in a hat. If the tickets are drawn one by one from the hat, what is the 
probability that the number of the drawing will coincide with the number drawn 
a specified number of times? It is clear how the analagous problem of matching 
cards of three or more identical decks of a given number of different cards arises 
(e.g., matching appearance, personality, and handwriting). The latter part of 
the present paper is concerned with this problem. Battin [1] has displayed a 
generating function for the probability of obtaining a given number of matched 
cards between any number of decks of arbitrary composition. Battin’s generat- 
ing function is used to derive explicitly the probability of obtaining a specified 
number of matched cards and the moments of the distribution. 


2. The Limiting Distribution of the Number of Matched Cards. Jn the 
ordinary card matching problem one is interested in the number of matchings 
when two decks, say D; and D2, are paired randomly. Let D,; consist of ny , 


Nie, *°* , Nx cards of suits S,;, Se, ---, Sz, respectively, and let De consist of 
: Y Y ° 

Noi, No, *** , Nox cards of suits S,;, Se, --- , Se, respectively-(any na; can be 0), 

where 


k k 
HK n= > Na = Nn. 
i=l] 


t=1 
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Let t;; (¢,j7 = 1, 2, --- , k) be the number of pairings each involving a card from 
D,; of suit S; and a card from D, of suit S;. It is easily seen that the probability 
of a set || ¢:;|| under random pairing is the same as that associated with the 
entries || ¢;; || in a k by k contingency table [2] for which the row totals are fixed 


aS M1, M12, °** , Mx, and the column totals are fixed as ne, Nee, -** , Nox, i.e. 
k k 
Il n;! Il No;! 
(1) P(t) = ——__——__. 
ni TT ty! 
i,j=1 


The probability of obtaining h matchings is the same as that of the sum of 
diagonal terms in a square contingency table, i.e., h = Ditis . In fact, in prac- 
tical cases, the problem frequently arises in this manner: If two individuals each 
classify n objects into k categories, h is the number of objects on whose classi- 
fication they agree. 

The distribution (1) has essentially (k — 1)’ variables since there are 2k — 1 
linear restrictions imposed on the t;;. It is easy to verify that, for fixed mi/n = 
mi; , say, and fixed no;/n = m2; , say, the distribution (1) approaches the normal 
distribution in (k — 1)° linearly independent variables, as n approaches infinity. 
Let us substitute 


bj — NMG Ms; 
Vn 
use Stirling’s formula for each factorial in (1), and take the logarithm. The 


argument proceeds in a manner similar to the classical case of the limit of the 
binomial distribution. 


Since there are imposed linear restrictions on the ¢;; 


aj = 


j= 1, 2, ---,k), 


k 
> ty = nem (¢ = 1, 2, ---, k); 
j=l 
k 
i bi; = n°Ma2; (j = l, 2, =e k), 
t=1 


there are also restrictions on the x;; , namely, 


k k 
Dy ti = 2, tii = 0. 


j=l i=l 
Hence there are (k — 1)° linearly independent x;;. If we choose 2;; (i, j = 
1, 2,---,k — 1) as the linearly independent variables, the limiting probability 


element as n approaches infirity, is 





1 
(2) 7 


k Kk 
(2x)8*—* (I mu II mo; 


t=1 j=) 


i,j7=1 


k—-1 
ae 4Q 
y" 1) @ II dx;;, 


i 
| 
' 
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k 2 
- Vij 
@ pa Mii M3; 
is written in terms of all the x;; with the understanding that the linearly depen- 
ent variables are linear functions of the independent variables. 
Now h — E(h) is simply a linear combination of z;;, namely, 


k 
h— E(h) = Vn di ti. 
Hence, it follows that 
h — E(h) 
V non 

is asymptotically normally distributed with mean zero and variance unity. For 
large n, then, it is possible to use the normal distribution to approximate signifi- 
cance levels for h. 

Of course, any other linear combination of the entries ¢;; is asymptotically 
normally distributed. The quantity Q in (2) can be recognized as the Pearson 
x’ for contingency tables, and the above constitutes proof that it actually has the 
x’ distribution with (k — 1)* degrees of freedom. 


3. Matchings between three or more decks. There are instances, such as 
the classification of n objects into k categories by 3 or more individuals, in which 
one is interested in the matchings of three decks or more. For any number of 
decks one can prove in a manner exactly analagous to §2 that the distribution 
of the number of matchings is asymptotically normal. Here the demonstration 
is indicated for three decks. Let us consider three decks D. (a = 1, 2, 3) with 
Nai, Near, *** » Nak, cards of suits S,, So, --- , S,, respectively. Let ¢,;; be the 
number of triplets consisting of a card from S, of D, , a card from S; of D2 , and 
a card from S; of D3 under random formation of triplets (i.e., laying down the 
three shuffled decks side by side). 

The probability law of the set {t,:;} can be derived by the consideration of the 
generating function, 


(ayr2r + tiyize + ee* + myst bo ee* + reyiti + oee* + eYez)” 
69) 


n! 
=> I tea! IT (aq Yi 2;)bgii , 


Oote7 


where the summation extends over all the partitions {t,;;} of n. The number of 
ways of deriving the set {t,:;} is the coefficient of [] (x, y:2;)'’*', namely, 
O+t,7 
n 


! 
II toi! ? 


0459 


where >) ty; = Mig, Dy bois = Mx, and Dy tgis = 15}. 
457 07 gt 
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The total number of ways of getting the marginal totals m, , ne; , and v3; is the 
coefficient of II xo '* y;?* z?*4 in (3); that is, in 


Ort 


(2 Lg Yyizj)” = (do t)"(2 yi)" 2)" 


P(tgij) = 1 toi aa mal TT ml] . 
(4) 0 mg!-T] nas! TT N3;! 
" (n!)? II toi;! 


This formula is analagous to (1) and, indeed, reduces to (1) for ns: = n, ns; = O 
(j = 2, 3,---,k). This is the probability associated with a three-way con- 
tingency table (k by k by k). For a contingency table, k by l by m, this prob- 
ability would be (4) with the limits on g of 0 and k; on i, 0 and 1; and on j, 0 
and m. 

For fixed values of the ratios nai/n = Mai (a = 1, 2,3;7 = 1, 2, , k), say, 
the k® — 3k +2 linearly independent variates in the set {t,;;} are smaialalbe 
normally distributed. To demonstrate this, substitute 


toig — NMig Ma Ms; 
Vn 

into (4) and use Stirling’s approximation. There are 3k — 2 independent linear 

restrictions on the z,;;, namely, 


9,7,7 = 1, 2, -o+, k) 


Loi = 


k 


k 
: oii = dX oii = 2 oii = 0. 


t,jm1 9.3= 


Therefore, there are k° — 3k + 2 z’s which are unrestricted. Using these vari- 
ables, we find that the limiting probability element of these 72,;; is 


(5) 





1 -t¢ 
(2r)** —3k +2) (II ‘Mig Il Mas ll ma) =p @ II AX 9; . 
9 i i 


where 


k 2 
Q= Loi 
= —_ , 
04,31 Mig Moi M3; 


4 
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and the :-roduct of differentials is of k* — 3k + 2 variables. The number of 
k 


matched triplets u, say, i@the sum z tii, and we have 


u— E(u) _ . . 
Wa ym: 


— E(u) 
n 


From these facts it follows that ~ is asymptotically normally distributed. 


/n 
The above results may be easily generalized. In a q-way contingency table 
with fixed marginal totals n-m.a:i (a = 1, 2,---,q;7 = 1, 2, --- , k), the prob- 
ability of a set {t,s ... ;} is 


Il II (n+Mai)! 


awl i=l 


1D cgi 
(nt)? JT ty...s! 


Oet,e**, fuel 


The entries minus their respective means and divided by +/n, namely, 


__ bgi...g — NMig May + ** Maj 
Lgi---7 —_ A? 
n 


are asymptotically normally distributed according to 


ae» (TY 7 ne ae 
2 ee mai) ¢, 
anor (TT 
where 
k s. 
Q om gt-**7 : 
Qrt,e++,jaml Mig Mai +++ Ma; 
The generalization of Pearson’s x’, namely Q, has the x’-distribution with 
k* — qk + q — 1 degrees of freedom. Finally, 


k 
&§ = >» big... ; 
tl 


the number of matched g-tuplets, under random formation of g-tuplets is asymp- 
totically normally distributed. 


4. Matching cards of identical decks, each of n different cards. The prob- 
ability of obtaining a given number of pairs of matched cards under random 
pairing of two identical decks each of n different cards has been derived by Chap- 
man [3] by a straightforward method and, of course, the solution of the classical 
problem mentioned in the introduction is this probability. Another technique 
involving the use of the general expression for the number of matchings of two 
decks of arbitrary composition can be easily generalized to three or more decks. 
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Before discussing this method, let us derive this general expression first by the 

use of the generating function discussed by Battin. Consider the multinomial 
(xiyre” + rye. +++ + rey + reyre’ +--+ + ree’). 


The coefficient of e”1,""" --- xf'* yf?! --- yf** (where k is the number of suits: 
mm; the number of cards of s suit S; in the first deck; m2; the number of cards of 
suit S; in the second deck; and n = Em; = Znai) is the number of ways the 


cards may be arranged so that there are h matchings. After expanding the 
multinomial 


[x ziyie + (dL mi) (Qu yi) — X as ys)” 


in powers of x; and y;, taking the proper coefficient, and dividing by the total 


number of ways the cards can be arranged, one arrives at the probability law of 
h [4], 


(6) P(h) = — : (-1)"*" (7) (" . . 


where 
(7) RS _e-X........ 
I] [(nuz — 8;)! (mex — 8;)! 8,1] 


where the summation is extended over all s; , satisfying the following conditions: 
238; = n — g, ms — & > 0, Nx — & > 0, 8s > 0 
(¢ = 1,2,---,k). 
From (6) one can easily derive the distribution of the number of matchings 
when two identical decks of n different cards are randomly paired. Let ni; = 1, 
=1l,andn =k. Then 7, as defined in (7) is 
- giym—g)!  _ nt — 
“ » onniy =a 1l0he ~ gin — gt 4 Nm — 9) 
for s; can equal 0 or 1 and there are ,C, choices of the 0’s. Hence, we find the 
probability of the number of matchings v to be 


(8) Po) = 4 > 


This result has been given by Chapman [8]. It is, in fact, a classical probability 
law. 


The moment generating function is 


an- ETc .£e-*. 


v=0 j= v!j ! g=0 g! 





’ 
! 


- 
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From this expression it is easy to verify that 
Ev) =1, oo =1, Ei) =1(r <n). 


It is interesting to observe that as n approaches infinity, the moment generating 
function approaches 


(9) }9 e-s = = et, 







It therefore follows that the limiting form of the distribution is the Poisson dis- 
tribution with parameter unity, namely, 
ya 


x! e x! 





(10) 











If one writes the moment generating function as 


6 6, F 
(h+he-) 
(11) 96) = )) ———___ + 

g=0 g! 

one can see that the first n powers of @ in (9) are the same as in (11). Hence, 
the first n moments of the distribution (8) are the same as those of the Poisson 
distribution (10). In particular it is interesting to observe that in the random 
pairing of any two series, such as the serial numbers and order numbers in the 
Selective Service drawing, the expected number of matchings is exactly 1. 

In applications of this method of matching (e.g., matching individuals and 
handwriting), the experiment may be repeated several times. It would be de- 
sirable, therefore, to have the probability law of the mean of a sample. The 
exact distribution, however, is too complicated to use. It follows from the cen- 
tral limit theorem that the mean of a sample of N observations from this dis- 
tribution is asymptotically normally distributed as N — o. It can also be 
shown by using the moment generating function that if the observations are from 
distributions with different n (i.e., the 7-th observation from a pair-of decks of 
n; cards, n; > 2), the distribution of the mean of the sample is. asymptotically 
normal. 

Now let us consider the analogue for three decks of cards. The generating 
function [1] for the number of matchings of three cards, one from each of three 
decks of arbitrary composition as defined in §3 is 



















(xyszre” + UYi2e + ose H+ UMYets +--+ H+ Vez 





oes + reyorre’ + +++ + aeynere’)”. 





The probability of obtaining ¢ matched triplets found after expanding this ex- 
pression is 


1 
(12) P(t) = me ~e (") r ; ‘\ (-1)"""T,, 


(n!)> —o=0 
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where 
af. g"(n a, 
TT] II (nas — «)1-a| 

where 


k 
D~s=n-g, %>0, 


t=] 
Na — 8 >O0 (a = 1,2,3;1 = 1,2,---,k). 


To specialize (12) for the case to be considered here, namely, three identical 
decks of n different cards each, we let 


Nai = 1 (a = 1,2,3;7 = 1,2,---,k), 


r= 


Then, observing that 
T, = (g !)’n l, 
one finds that the probability of ¢ matchings is 


_ 1 F(-1)"*%g! _ 1 F(-D’im-t-j! 
ad "ey — nit! g=0 (n -t=— g)! nit! i=0 z! : 


The moment generating function is 


(14) eo) = > 





One can readily verify that 


(15) E® ==, 


Since both E(t) and «7 approach 0, as n approaches infinity, by Tchebycheff’s 
inequality we can see that the probability approaches 1 that there will be no 
matched triplet as n increases without bound. As in the case of two decks, the 
result that th: mean of a sample from this population is asymptotically normally 
distributed foilows from the central limit theorem. 

For the general case of q identical decks each of n different cards we can gen- 
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eralize (13), (14), and (15) immediately. First, let us note that the probability 
of s matched cards for q decks of arbitrary composition is 


py = AM! (x) (= 9) nrerny, 


(n!)@ g=0 \S g 
where 
(g!)¢(n — g)! 


II (Nei —_ atest] 


a= 


T= DL 
tl 


where 






k 
Lae=n-g 8% 20, 


t= 
(nai — Si) > 0 
(a = 1,2,---,q;71 = 1,2, ---,k). 





The probability of w, the number of matchings when each of the q decks consists 
of n different cards, is 





1 Spi = wpe 
PO) = Ghe ke 


The moment generating function is 





1 Sle og _ ay, 


(n!)?-* 5= g: 


Finally, the mean and variance are 












1 
E(w) = ne? 
ot an Mn — Ut + — (n— 1)" 


n2(e-2) (n ai 1)?-? 





5. Summary. Two distinct problems associated with card matching have 
been considered in this paper. In the first place it has been shown that the dis- 
tribution of the number of matched pairs obtained under conditions of random 
pairing of two decks of arbitrary composition is asymptotically normal when 
the number of cards in each deck approaches infinity and the proportion of cards 
in each suit remains fixed. This demonstration was extended to the cases of 
matchings between three or more decks. The second problem treated in the 
present paper is concerned with the matchings between identical decks, each of 
n different cards. The probability law for the case of two decks was derived by 
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the use of a generating function. When n approaches infinity the limiting 
distribution was shown to be Poisson. The case of three or more decks was 
treated in similar manner, with the probability law and the moments given. 
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NOTES 


This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 
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THE DETECTION OF DEFECTIVE MEMBERS OF LARGE 
POPULATIONS 


By RosBert DorFMAN 
Washington, D. C. 


The inspection of the individual members of a large population is an expensive 
and tedious process. Often in testing the results of manufacture the work can 
be reduced greatly by examining only a sample of the population and rejecting 
the whole if the proportion of defectives in the sample is unduly large. In many 
inspections, however, the objective is to eliminate all the defective members of 
the population. This situation arises in manufacturing processes where the 
defect being tested for can result in disastrous failures. It also arises in certain 
inspections of human populations. Where the objective is to weed out indi- 
vidual defective units, a sample inspection will clearly not suffice. It will be 
shown in this paper that a different statistical approach can, under certain con- 
ditions, yield significant savings in effort and expense when a complete elimina- 
tion of defective units is desired. 

It should be noted at the outset that when large populations are being in- 
spected the objective of eliminating all units with a particular defect can never 
be fully attained. Mechanical and chemical failures and, especially, man- 
failures make it inevitable that mistakes will occur when many units are being 
examined. Although the procedure described in this paper does not directly 
attack the problem of technical and psychological fallibility, it may contribute 
to its partial solution by reducing the tediousness of the work and by making 
more elaborate and more sensitive inspections economically feasible. In the 
following discussion no attention will be paid to the possibility of technical 
failure or operators’ error. 

The method will be described by showing its application to a large-scale pro- 
ject on which the United States Public Health Service and the Selective Service 
System are now engaged. The object of the program is to weed out all syphilitic 
men called up for induction. Under this program each prospective inductee is 
subjected to a ‘‘Wasserman-type” blood test: The test may be divided con- 
veniently into two parts: 

1. A sample of blood is drawn from the man, 

2. The blood sample is subjected to a laboratory analysis which reveals the 
presence or absence of “syphilitic antigen.”” The presence of syphilitic 
antigen is a good indication of infection. 

When this procedure is used, N chemical analyses are required in order to 

detect all infected members of a population of size N. 

The germ of the proposed technique is revealed by the following possibility. 
Suppose that after the individual blood sera are drawn they are pooled in groups 
436 
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of, say, five and that the groups rather than the individual sera are subjected to 
chemical analysis. If none of the five sera contributing to the pool contains 
syphilitic antigen, the pool will not contain it either and will test negative. If, 
however, one or more of the sera contain syphilitic antigen, the pool will contain 
it also and the group test will reveal its presence. The individuals making up 
the pool must then be retested to determine which of the members are infected. 
It is not necessary to draw a new blood sample for this purpose since sufficient 
blood for both the test and the retest can be taken at once.- The chemical 
analyses require only small quantities of blood. 
Two questions arise immediately: 
1. Will the group technique require fewer chemical analyses than the indi- 
vidual technique and, if so, what is the extent of the saving; and 
2. What is the most efficient size for the groups? 
Both questions are answered by a study of the probability of obtaining an 
infected group. Let 
p = the prevalence rate pez hundred, that is the probability 
that a random selection will yield an infected individual. 
Then 
1 — p = the probability of selecting at random an individual free 
from infection. And 
(1 — p)” = the probability of obtaining by random selection a group 
of n individuals all of whom are free from infection. 
Then 
the probability of obtaining by random selection a group 
of n with at least one infected member. 


p’ =1-—(1 — p)" 


Further 
N/n = the number of groups of size n in a population of size N, 
so 
p'N/n = the expected number of infected groups of n in a popu- 
lation of N with a prevalence rate of p. 
The expected number of chemical analyses required by the grouping pro- 
cedure would be 
E(T) = N/n + n(N/n)p’ 
or the number of groups plus the number of individuals in groups which require 
retesting.” The ratio of the number of tests required by the group technique to 


the number required by the individual technique is a measure of its expected 
relative cost. It is given by: 


C=T/N = 1/n+ p’ 


n+1 _ n\n 
(1 — p)’. 





1 Diagnostic tests for syphilis are extremely sensitive and will show positive results for 
even great dilutions of antigen. ; 

2 The variance of T is oy = nNp'(1 — p’) = nN[(1 — p)" — (1 — p)*J. The coefficient 
of variation of 7’ becomes small rapidly as N increases. 
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The extent of the savings attainable by use of the group method depends op the 
group size and the prevalence rate. Figure 1 shows the shape of the relative 
cost curve for five prevalence rates ranging from .01 to .15.° For a prevalence 
rate of .01 it is clear from the chart that only 20% as many tests would be 
required by group tests with groups of 11 than by individual testing. The at- 
tainable savings decrease as the prevalence rate increases, and for a prevalence 
rate of .15, 72% as many tests are required by the most efficient grouping as by 
individual testing. The optimum group size for a population with a known 
prevalence rate is the integral value of n which has the lowest corresponding 
value on the relative cost curve for that prevalence rate. 


TABLE I 
Optimum Group Stzes and Relative Testing Costs for Selected Prevalence Rates 


Prevalence Rate | Optimum Group | Relative Testing 
(per cent) Size Cost 


| 


Percent Saving 
Attainable 

1 11 

2 8 


| 
| 
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Optimum group sizes and their costs relative to the cost of individual testing 
are given in Table I for selected prevalence rates. 

This table, together with the description of the group testing technique as it 
might be applied to blood tests for syphilis, reveals the two conditions for the 
economical application of the technique: 

1. That the prevalence rate be sufficiently small to make worth while econo- 
mies possible; and 


3 The prevalence rate of syphilis among the first million selectees and volunteers was 
.0185 for whites and .2477 for other races. Geographically, the prevalence rate for whites 
ranged from .0505 in Arizona to .0051 in Wisconsin. See Parran, Thomas and Vonderlehr, 
R.A., Plain Words about Venereal Disease, Reynal and Hitchcock, New York. 
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2. That it be easier or more economical to obtain an observation on a group 
than on the individuals of the group separately. 
Where these conditions exist, it will be more economical to locate defective mem- 
bers of a population by means of group testing than by means of individual 
testing. 

The principle of group testing may be applied to situations where the interest 
centers in the degree to which an imperfection is present rather than merely in 
its presence or absence. For example, it could be applied to lots of chemicals 
where it is desired to reject all batches with more than a certain degree of im- 
purity. If samples of a chemical are pooled and subjected to a single analysis, 
the degree of impurity in the pool will be the average of the inypurities in the 

LAB ANALYSES 


PER HUNDRED 
BLOOD TESTS 


120 .. = Se eee ee ee ae ee 
| 
| 


60 








20. 








SIZE OF GROUP 


Fic. 1. Economies resulting from blood testing by groups 
P.R. denotes prevalence rate 


separate samples. If the criterion were adopted that the members of a poo! 
would be examined individually whenever the proportion of impurity in the pool 
is greater than 1/n-th the maximum acceptable degree of impurity, clearly no 
excessively impure batches would get by. The extent of the saving accomplished 
by this means can be computed by letting p’ equal the probability that the pool 
will be impure enough to warrant retesting its constituent batches and using the 
formulas given above. The probability, p’, can be calculated easily from the 
probability distribution of impurities in the separate batches. 

It is evident that this approach will produce worthwhile savings only if the 
limit of acceptability is liberally above the per cent of impurity encountered in 
the bulk of the batches. It is also evident that under this scheme many of the 
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retests will indicate that all the batches in the pool are acceptable and that the 
retesting was not really needed. The criterion for retesting can be raised above 
1/n-th the limit of acceptability at the cost of a relatively small risk of accepting 
overly impure batches. The probability of failing to detect a defective batch 
when the retest criterion is raised in this manner will depend upon the form and 
parameters of the distribution of imperfection in single batches, as well as upon 
the number of batches in the pool. No simple general solution for this problem 
has been found. 


FURTHER POINTS ON MATRIX CALCULATION AND SIMULTANEOUS 
EQUATIONS 


By Haroutp Hore.iine 


Columbia University 


Since the publication of ‘‘SSome new methods in matrix calculation” in the 
Annals of Mathematical Statistics (March, 1943, pp. 1-34), the following relevant 
points have come to the attention of the author. 

A. T. Lonseth has improved substantially the limit of error for the efficient 
method of inverting a matrix described on p. 14. He writes: 

“Your use of the ‘norm’ of a matrix in the Annals paper especially interests 
me, as I was recently led to use it in solving the errors problem for infinite 
linear systems which are equivalent to Fredholm-type integral equations. 

“It is possible to replace the term p’ in your inequality (7.5) by one, so that 


N(Cm — A’) S N(Cy)K"/(1 — k). 


To see this, one observes that from the developments on the bottom of p. 13 
it follows that (I — D)~* = I + D*, where N(D*) < k/(1 — k). Then 


Co(I — Dy? = Cy + CoD* 
so that 
N{Co(I — D)~] < N(Co) + N(Co) N(D*) = N(Cr){1 + N(D%)}, 


from which the result stated is seen to follow. I happen to have noticed 

this because the same thing has cropped up often in my recent work, and for 

the infinite case a bound p' is no bound at all. 

“Your paper has suggested improvements in my own proofs, for which I 

am grateful.” 
Dr. Lonseth’s first formula above might well be written at the bottom of p. 14 
of my article as a substitute for (7.5). It both simplifies and reduces the limit 
of error. 

A method of solving normal equations by iteration, in which trial values of 
the unknown regression coefficients were applied to the values of the predictors 
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and predictand in each of the N cases, and the results were used to improve the 
trial values, was orally suggested by John C. Flanagan in 1934. The plan in- 
volved the use of punched cards for the N substitutions in the trial regression 
equation at each stage. However, it seemed on further consideration and dis- 
cussion that this would involve an unnecessarily large amount of work, since 
other methods require only as many substitutions at each stage as the number 
of unknowns, which is always less than N and usually very much less. I believe 
that Dr. Flanagan thereupon abandoned this plan and never published it. 

Louis I. Guttman has proposed a similar method,” and has provided a proof 
of convergence in certain cases. In a final section he shows that the method can 
be modified by applying the same type of iterations to the normal, or product- 
sum, matrix instead of to the matrix of observations. This modification avoids 
the difficulty mentioned above. It is stated that one of these methods has been 
applied to a 64-variable problem. 

The first method of section 10 of my paper for solving sets of linear equations 
is equivalent, in the case of normal equations, to the second method of Dr. 
Guttman. It is regrettable that reference to his study was omitted. 

R. D. Gordon believes that the inequalities for principal components obtained 
at the end of the paper can be improved, but his entry into the army has pre- 
vented his fully working out his ideas. Paul A. Samuelson has some new and 
as yet unpublished ideas relating to calculation of principal components. 

Merrill M. Flood, in ‘‘A computational procedure for the method of principal 
components,” Psychometrika, Vol. 5 (1940), pp. 169-172, presents a method which 
appears to have considerable value, in that the number of vector multiplications 
is relatively small. However it requires solution of a system of p — 1 linear 
equations for each latent vector determined, and also of an additional such 
system. The relative value of this and other methods may depend on the rela- 
tive costs of vector multiplication and of solving systems of linear equations. 
This in turn depends on the mechanical facilities available. 

Paul Horst’s paper, “‘A method for determining the coefficients of a character- 
istic equation” (Annals of Mathematical’ Statistics, Vol. 6 (1935), pp.. 83-84) 
should have been referred to in connection with sections 11 and 12. 

On p. 23 of “Some new methods in matrix calculation,” in the sixth line from 
the bottom, smaller should be replaced by greater. On p. 32, the last expression 
in the third line should have 1; in place of r,. The last displayed formula on 
this page should read 


2 

—_ 2 
wi tes + we > 1 - eT” 
(m1 — Vit) 


and the subscript r + 1 in the next line should be k + 1. 
1 ‘An iterative method for multiple correlation,’’ The Prediction of Personal Adjustment, 


by Paul Horst and collaborators, Social Science Research Council, New York, 1941, pp. 
313-318. 
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NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of general interest 


Personal Items 


Dr. Paul H. Anderson is Regional Statistician with the War Production Board 
and Lecturer in Mathematics at Western Reserve University. 

Mr. Kenneth J. Arrow is a second lieutenant with the United States Army 
Air Forces. 

Assistant Professor H. M. Bacon of Stanford University has been promoted 
to an associate professorship. 

Dr. G. A. Baker of the College of Agriculture of the University of California 
has been promoted to an assistant professorship. 

Mr. Blair M. Bennett is attached to the Operations Research Section of the 
Eighth Bomber Command. 

Mr. Richard Berger is in the United States Army Air Forces. 

Mr. John L. Carlson is a lieutenant in the United States Naval Reserve. 

Mr. Edward P. Colman is a major in the Coast Artillery Corps and is stationed 
at West Point. 

Mr. William F. Elkin has been appointed Research Secret? *-- of the Philadelphia 
Tuberculosis and Health Association. 

Professor R. A. Fisher, Galton Professor in the University of London since 
1933, has been appointed to the chair of Genetics in Cambridge University. 

Dr. J. P. Guilford is a lieutenant colonel in the Army Air Forces. He is chief of 
the Field Research Unit, Psychological Section of the Surgeon’s Office with 
headquarters at Fort Worth. 

Dr. Edward Helley of Monmouth Junior College has been appointed Visiting 
Lecturer at the Illinois Institute of Technology. 

Dr. H. B. Mann has been appointed to an instructorship at Bard College, 
Columbia University. 

Dr. Nilan Norris is a lieutenant in the Army Air Forces, and is serving as Sta- 
tistical Officer. 

Dr. Edwin G. Olds has been granted leave by Carnegie Institute of Technology 
to act as Chief Statistical Consultant to the Industrial Processes Branch of the 
Office of Production Research and Development, War Production Board. 

Miss Ruth L. Owen has been commissioned as an ensign in the United States 
Naval Reserve. She is acting as Supply and Disbursing Officer for the Naval 
V-12 Unit at St. Lawrence University. 

Mr. Robert W. Royston is a lieutenant in the United States Naval Reserve. 

Dr. H. M. Schwartz has been appointed Assistant Professor of Mathematics at 
the University of Idaho. 

Mr. William B. Simpson is now a member of the armed forces and is stationed 
at Camp Crowder. 

Mr. Irvin Stein is an ensign in the United States Naval Reserve. 


NEWS AND NOTICES 443 
My. Milton S. Stevens is an apprentice seaman in the United States Naval 
Reserve. 


Mr. W. A. Vezeau has been promoted to the rank of Assistant Professor of 
Mathematics at the University of Detroit. 


Organization of Washington Chapter of the Institute 


Professor Harold Hotelling of Columbia University spoke at George Wash- 
ington University, November 19, 1943, under the auspices of the Institute of 
Mathematical Statistics before an audience of over 150 persons. The subject of 
his lecture was Multivariate Statistical Analysis. At the close of the lecture the 
Washington Chapter of the Institute was organized. A Planning Committee 
consisting of William G. Madow, Chairman, Meyer A. Girshick, and W. Ed- 
wards Deming was elected. Members of the Institute who are interested in 
being in contact with the Washington Ch pter should write to William G. 
Madow, Bureau of Agricultural Economics, Department of Agriculture. 


New Members 
The following persons have been elected to membership in the Institute: 


Belz, Asso. Prof. Maurice H. M.A. (Melbourne) Univ. of Melbourne, Carlton, N. 3, 
Victoria, Australia. 

Capé6, Bernardo G. Ph.D. (Cornell) Biometrician, Agric. Exp. Station, Rio Piedras, 
Puerto Rico, 32} Rosario St., Santurce. 

Crawford, Elizabeth S. B.A. (Mundelein Coll.) Asso. Labor Market Analyst, War Man- 
power Commission. 935 Lincoln St., Denver, Colo. 

Crawford, James R. Div. Supervisor, Vega Aircraft Corp., 11626 Kitiridge St., N. Holly- 
wood, Calif. 

Hoffer, Prof. Irwin S. M.B.A. (Harvard) Temple Univ., Philadelphia, Pa., Willow Ave., 
Ambler, Pa. 

Maynard, Burton I. A.B. (Stanford) Stat. Analyst and Stat., 11211 Brookhaven Ave., 
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REPORT ON THE NEW BRUNSWICK MEETING OF THE INSTITUTE 


The Sixth Summer Meeting of the Institute of Mathematical Statistics was 
held at The New Jersey College for Women, Rutgers University, Sunday and 
Monday, September 12 and 13, 1943, in conjunction with the meetings of the 
American Mathematical Society and the Mathematical Association of America. 
The following fifty-two members of the Institute attended the meeting: 


T.W. Anderson, H. E. Arnold, K. J. Arnold, L. A. Aroian, B. M. Bennett, E. E. Blanche, 
C.I. Bliss, A. H. Bowker, Hobart Bushey, W. G. Cochran, T. F. Cope, C. C. Craig, H. B. 
Curry, J. H. Curtiss, J. F. Daly, Mary Elveback, W. Feller, R. M. Foster, J. A. Greenwood, 
J. I. Griffin, C. C. Grove, F. E. Grubbs, E. J. Gumbel, Harold Hotelling, Tjalling Koop- 
mans, H. G. Landau, Howard Levene, Simon Lopata, P. J. McCarthy, W. G. Madow, Mar- 
garet Martin, J. W. Mauchly, E. B. Mode, L. F. Nanni, C. O. Oakley, P. S. Olmstead, 
F. E. Satterthwaite, Bernice Scherl, H. M. Schwartz, L. W. Shaw, J. Shohat, Blanche Ska- 
lak, Mortimer Spiegelman, Arthur Stein, H. W. Steinhaus, A. W. Tucker, J. W. Tukey, 
D. F. Votaw, Abraham Wald, S. 8S. Wilks, Jacob Wolfowitz, Bertram Yood. 


Professor 8. S. Wilks of Princeton University acted as chairman for the Sunday 
morning session. The following papers were presented: 


1. Some New Statistical Applications of Partitioned Matrices and Iterative Methods. 
Harold Hotelling, Columbia University 

2. On the Construction of Orthogonal Latin Squares. 
Henry B. Mann, Columbia University 


Dr. Jacob Wolfowitz, Columbia University, presided at the session on Sunday 


afternoon. At this session the following papers were presented: 


1. Recent Developments in the Statistical Analysis of Problems Requiring the Use of 
Vector Variates. 
W. G. Madow, Office of Price Administration. 

2. Statistical Inference when the Form of the Distribution Function is Unknown. 
Henry Scheffé, Princeton University. 


The session on Monday morning was held jointly with the American Mathe- 
matical Society. Professor C. C. Craig, University of Michigan, acted as chair- 
man, and the following contributed papers were read: 


1. Asymptotic Distributions of Ascending and Descending Runs. 
Jacob Wolfowitz, Columbia University. 
2. On the Plotting of Statistical Observations. 
E. J. Gumbel, New School for Social- Research. 
. On a Measure-Theoretic Problem Arising in the Theory of Non-Parametric Tests. 
(Read by title.) 
Henry Scheffé, Princeton University. 
. On a General Class of ‘‘Contagious’’ Distributions. 
Will Feller, Brown University. 
. On the Statistical Treatment of Linear Stochastic Difference Equations. 
H. B. Mann and Abraham Wald, Columbia University. 
. An Exact Test for Randomness in the Non-Parametric Case Based on Serial Corre- 
lation. 
Abraham Wald and Jacob Wolfowitz, Columbia University. 
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On Saturday afternoon the members of the three societies were the guests of 
Miss Margaret Trumbull Corwin, Dean of the College, New Jersey College for 
Women, at an informal reception at the Dean’s House. On Sunday evening an 
informal buffet supper for the mathematical organizations was served at Wood 
Lawn, the Alumnae House of the New Jersey College for Women. Later the 
same evening the Department of Music presented a Musicale in the Music 
Building. 

Epwin G. OLps, 
Secretary 
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REPORT ON THE SECOND MEETING OF THE PITTSBURGH 
CHAPTER OF THE INSTITUTE 


The second meeting of the Pittsburgh Chapter of the Institute of Mathemati- 
cal Statistics was held at Carnegie Union, Carnegie Institute of Technology, on 
Saturday, October 9, 1943. Thirty-four persons attended the meeting, including 
the following eight members of the Institute: 

W. O. Clinedinst, G. G. Eldredge, K. L. Fetters, H. J. Hand, G. E. Niver, 
F. G. Norris, E. G. Olds, E. M. Schrock. 

At the morning session Mr. Charles E. Young, Westinghouse Electric and 
Manufacturing Company, presented a paper entitled ‘Analysis of Cyclical Fluc- 
tuations.”” The program for the afternoon session consisted of a paper entitled 
“Use of orthogonal coordinates in linear regression,” presented by Mr. W. O. 
Clinedinst, National Tube Company. Mr. F. G. Norris, President of the Pitts- 
burgh Chapter, acted as chairman for both sessions. 

Howarp Hann, 
Secretary of the Pittsburgh Chapter 
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ABSTRACTS OF PAPERS 


(Presented Monday, September 13, 1943, at the New Brunswick Meeting 
of the Institute) 


Asymptotic Distributions of Ascending and Descending Runs. JacoB WoLFo- 
witz, Columbia University. 


Let a; , a2 ,--* , @y be any permutation of N unequal numbers. Let there be assigned to 
each permutation the same probability. An element a;(1 < i < N) is called a turning point 
if a; is greater than or less than both ai_; and aj; ._ Let a; and a;44 be consecutive turning 
points; they are said to determine a “‘run”’ of length k. The author obtains the asymptotic 
distributions of a large class of functions of these runs. An example of his results is the 
following: It is proved that the following are asymptotically normally distributed: (a) 
the total number of runs; (b) R(p), the number of runs of length p; (c) R(p) and R(q) 
jointly. Similar results are obtained for runs defined by any of a large set of criteria, of 
which the one given above is of value in statistical applications. 
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On the Plotting of Statistical Observations. E. J. GumBpe.t, The New School 
for Social Research. 


It is well known that there exist two step functions corresponding to a continuous vari- 
ate. We may attribute to the m-th observation the ranks m orm — 1. To obtain one and 
only one serial number m, which will, in general, not be integer, we attribute to 2, an ad- 
justed frequency m — A, namely the probability of the most probable m-th value. The 
correction A for the rank thus introduced depends upon the distribution. If the variate 
is unlimited and possesses a mode, A increases for increasing values of the variate from zero 
up to unity. The correction is important for small numbers of observations. For large 
numbers of observations and for the ogive it is sufficient to choose A = 3. The calculation 
of A allows a correct plotting of all observations (including the first and last) on probability 
paper (equiprobability test). For the return periods, the ranks m and m — 1 correspond 
to the observed exceedance and recurrence intervals. The correction A leads to adjusted 
return periods which pass for increasing values of the variate from the exceedance to the 
recurrence intervals, provided the variate is unlimited and possesses a single mode. The 
asymptotic standard error of the partition values may be used to construct confidence bands 
for the ogive, the equiprobability test, and the return periods. This control for the fit 
between theory and observation may be applied to all observations which are not extreme. 


On a Measure-Theoretic Problem Arising in the Theory of Non-Parametric 
Tests. Henry ScHEFFH, Princeton University. 


Let F(x) be the cumulative distribution function of a univariate population. Denote a 
sample from the population by the sample point, FE = (2 , x2, --- , 2%) and let w be a Borel 
region in the sample space. How can we characterize w in order that Pr{E in w} be inde- 
pendent of F(z) for all F in a given class of distribution functions? For various classes of 
F necessary conditions and sufficient conditions are found. For example, if the boundary 
of w is a null set, a necessary and sufficient condition for w to have the desired property for 
all absolutely continuous F(x) is that it have the following structure except on a null set: 
For every point £ in the sample space, M of the k! points obtained by permuting the co- 
ordinates of E are in w and the remaining k! — M are not (0 < M < k!). 


On a General Class of “Contagious” Distributions. W. Fetter, Brown Uni- 
versity. 


This paper is concerned with some properties of a class of contagious distributions which 
contains, among others, some distributions studied by Greenwood and Yule, Polya, and 
Neyman, respectively. 


On the Statistical Treatment of Linear Stochastic Difference Equations. 
H. B. Mann anp A. Watp, Columbia University. 


For any integer t let 24, +++ , +: be a set of r random variables which satisfy the system 
tr Pi; 
of linear stochastic difference equations - D> iinet +a: = ee (i =1,---,r). The 
j=1 k=0 1 
coefficients aij, and a; are (known or unknown) constants and the vectors e; = (e4, °°: , €rt) 
(t = 1,2,--- , ad inf.) are independently distributed random vectors each having the same 
distribution. It is assumed that E(e::) = 0. The problem dealt with in this paper is to 
estimate the unknown coefficients a;;; and a; on the basis of Nr observations zi; (¢ =1,---, 
r;t=1,---,N). Thestatistics used as estimates of the unknown coefficients are identical 
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with the maximum likelihood estimates if e; is normally distributed. The joint limiting 


distribution of these estimates is obtained without assuming normality of the distribution 
of €¢. 


An Exact Test for Randomness in the Non-Parametric Case Based on Serial 
Correlation. A. WaLp anv J. WotFow!Tz, Columbia University. 


Let X; ,°++ , X, be n chance variables, about the distribution of which nothing is known. 
Let the problem be to test the (null) hypothesis that X, , --- , X, are independently dis- 
tributed with the same distribution function. Itis shown that an exact test of this hypoth- 
esis based on the serial correlation coefficient can be made. For this purpose the distri- 
bution of the serial correlation coefficient in the sub-population consisting of all possible 
permutations of the observed values isemployed. Under the null hypothesis, this distribu- 
tion is independent of the distribution function of X;(t = 1, --- , n). Several exact mo- 
ments are obtained and asymptotic normality is proved. 
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Brown, Bryan, Carlton, G. J. Cox, 
Curtiss, Day, Deming, Dorfman, W. D. 
Evans, Fadner, Flaherty, Frankel, 
Fraser, Friedman, Girschick, Green- 
wood, Greville, Hagood, Hansen, Hoy, 
Hurwitz, Jenss, Juran, Kantorovitz, 
Kennedy, G. B. King, Knudsen, Koop- 
mans, Kullback, Kury, Lancaste’, 
Landau, Lieberman, Luykx, Madow, 
Marcuse, Morrison, Myers, Nisselson, 
K. A. Norton, Osborne, O’Toole, Perlo, 
Rosenblatt, Salkind, Sandomire, Sa- 
suly, Segal, W. A. Shelton, Shulman, 
J. H. Smith, Stephan, Tepping, Tuttle, 
Vickery, Weida. 


Fioripa (3) 


Lake City. Chapman. 
OrtaNnpo. Kwerel, Upholt. 


GEORGIA (2) 


ATLANTA. Barnes, Pierce. 


IpaHo (1) 


Moscow. Schwartz. 
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ILLINOIS (22) 


Cuicaco Bartky, Brooks, Helly, House- 
holder, Hurwicz, Lange, Leavens, 
Mansfield, Marschak, Steinberg, L. R. 
Tucker, Wright, Yntema. 

Evanston. Hildebrandt. 

GALEsBuURG. Morton. 

JACKSONVILLE. Lukacs. 

Rock Isuanp. Cederberg. 

Urpana. Crathorne, Gibson, 
Welker. 

WILMETTE. 


Springer, 


Wescott. 


INDIANA (2) 


InpIANAPOLIS. L. T. E. Thompson. 
West LaFayette. Burr. 


Iowa (9) 


Ames. Bancroft, Cochran, A. J. King, 
Perotti, Snedecor. 


Des Moines. Groth. 
Iowa City. A. T. Craig, Knowler, Rieiz. 
Kansas (2) 


MANHATTAN. Fryer, White. 


Kentucky (2) 


LEXINGTON. South. 
LOUISVILLE. Stevenson. 
MARYLAND (26) 
ABERDEEN. Bechhofer, Norden, Reno, 
Schrock, A. Stein, Tomlinson. 
ABERDEEN PrROvING Grounp. A. A. Ben- 


nett, Grubbs, B. I. Hart, Kent, Shaw, 
L. E. Simon, Stergion. 

BaLTIMoRE. Crosby, Merrell, Reed, Siegel- 
tuch, M. M. Torrey. 

BETHESDA. Dorn, Rosander. 

Cuevy Cuase. Olshen. 

Gusen Ecno. Doob. 

Havre De GRACE. 
man. 

HyYaTTsvILLE. Gurney. 

Su1TLanp. Sternberg. 


Nekrassoff, Zucker- 


MASSACHUSETTS (17) 


Apams. I. Stein. 

Boston. Gifford, Mode. 

CAMBRIDGE. Freeman, C. H. Gordon, 
Huntington, Kaplansky, Kelley, Lang- 
muir, Mises, Rulon, Samuelson, E. W. 
Wilson. 
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SPRINGFIELD. Young. 
WALPOLE. Secrist. 
West Roxsury. Klein. 
Worcester. O’Callahan. 


MIcuHiIGAN (24) 


ANN ArBor. C. A. Bennett, Carver, Cope- 
land, Cotterman, C. C. Craig, Dwyer, 
Fisher, Nesbitt, Raiford, W. P. Wilson. 

Derroit. Henry, Katz, Levin, Pixley, 
Reitz, G. W. Thomson, Vezeau. 

East Lansine. Baten, Burrows, Dressel, 
Speeker. 

Fuint. Swanson. 

Katamazoo. Northam. 

Lansina. Naugle. 


MINNEsorTa (13) 


MINNEAPOLIS. Fattu, W. L. Hart, D. Jack- 
son, P. O. Johnson, Kozelka, Mayer, 
Moore, Mudgett, Treloar. 

RocHESTER. Gage, Miner. 

St. Paut. Thornton, Waite. 


MissIssIPPI (2) 


State Co.ueEGeE. Ollivier. 
University. Bickerstaff. 


Missourt (5) 


Camp CrowpveErR. Simpson. 
St. Lovrs. Baldwin, Feldman, Regan, 
Rider. 


Montana (1) 


BozeMaN. Livers. 


New HampPsHIRE (2) 
DourHam. K. J. Arnold, Kichline. 


New JERSEY (31) 


BayYoNNE. Yood. 

BuTLerR. Payne. 

Dover. A. C. Cohen, Cohn. 
East OranGE. Molina. 
Essex Farts. Olmstead. 
KEARNEY. Webster. 
Mapison. Battin. 
Montciarr. Murphy 
Movuntaln Lakes. Hotelling, Shewhart. 
Murray Hitu. Head. 
Newark. Neifeld. 

New Brunswick. Walter. 


OF INSTITUTE 


PrincETON. T. W. Anderson, G. W. 
Brown, Flood, Mosteller, Nanni, 
Scheffé, Singleton, A. W. Tucker, 
Tukey, Votaw, Wilks, Winsor. 

Summit. Nelson. 

Union. Di Salvatore. 

WESTFIELD. Foster. 

West New York. B. A. Gottfried, D. K. 
Gottfried. 


New Mexico (1) 


ALBUQUERQUE. Larsen. 


New York (113) 


ALBANY. Malzberg, W. R. Thompson. 
ANNANDALE-ON-Hupson. Mann. 
BaLpwin. Grove. 

Beacon. H.-K. Hammer. 

Bronx. Dutka, Laderman, Solomon, 
Zwerling. 

Brooktyn. Cloudman, M. S. Cohen, 
Griffin, Lopata, Morrison, Painter, 
Sells, Stewart, Weiner. 

BuFFaLo. Blanche, Brumbaugh, Jolliffe, 
Kavanagh, E. B. Olds, Shephard, Ull- 
man. 

Canton. Bates, Owen. 

FiusHine. Cope, Sard. 

Forest Hiuuts. Haines. 

Great Neck. O’Connor. 

Irnaca. J. B. Cohen, Durand, Guttman. 

LarcHMONT. Cureton. 

LittLe Neck. S. Robinson. 

New York. Alt, Aroian, Bachelor, Bailey, 
Bassford, Blackadar, Boozer, Boschan, 
Bowker, Brookner, Burgess, Bushey, 
Court, Daly, Dodge, Edwards, Eisen- 
hart, Elveback, Fertig, Fry, Goode, 
R. D. Gordon, Gumbel, Haavelmo, 
Hilfer, Jablon, Levene, Levine, Levy, 

* Lew, Li, Lorge, Lotka, Martin, P. J. 
McCarthy, Neurath, Noether, N. Nor- 

' ris, Paulson, Peterson, Preinreich, 

Preston, Ratkowitz, Riordan, W. S. 

Robinson, Romig, Roos, Schapiro, 

Scherl, L. G. Simon, Skalak, Spiegel- 

man, Steinhaus, Stevens, M. N. Torrey, 

Wald, Walker, Wallis, Wilkinson, Wol- 

fowitz, Zeiger, Zubin. 

PARKCHESTER. Sternhell. 

Port WasHINGTON. Kimball. 

PouGHKEEPSIE. Hopper. 

RicHMOND HIitu. Spaney. 


- 
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RocuHEstTeR. Lengyel. 

RoOcKVILLE CENTER. Berger. 
SCHENECTADY. Wareham. 

SuFFERN. Rule. 

TENAFLY. Williams. 

Troy. Van Winkle. 

WantacH. F. E. Smith. 

West Point. E. P. Coleman, Guard. 
YONKERS. Youden. : 


NortH Caro.ina (8) 


CuHapPeL Hitut. LeLeiko. 

DurHaM. Schumacher. 

Paw Creek. Elting. 

RaveicH. R. L. Anderson, Cell, Clarkson, 
G. M. Cox, Hendricks. 


Onto (12) 


CLEVELAND. P. H. Anderson, 
Kramer, Motock. 

Cotumsus. L. E. Smart, Toops. 

Oxrorp. Pollard. 

Souts Evcuip. Godfrey. 

STEUBENVILLE. F. G. Norris. 

ToLtepo. Mummery. 

WELLINGTON. Ruger. 

Youncstown. Fetters. 


Cowan, 


OKLAHOMA (2) 


*CuinTon. Carlson. 
NorMAN. Dixon. 


OREGON (2) 


P. C. Hammer. 
Kossack. 


CoRVALLIS. 
EUGENE. 


PENNSYLVANIA (26) 


AMBLER. Hoffer. 
BETHLEHEM. Passano. 
Bryn Mawr. Geiringer. 
DravosBurG. Sturtevant. 
East PittspurGH. Manuele. 
Hatsoro. O. M. Smart. 
HAvERFORD. Oakley. 
LEWISBURG. Richardson. 
Oakmont. Petrie. 
PHILADELPHIA. Curry, 
Watson. 
PitTsBuRGH. Bernstein, Blackburn, Cline- 
dinst, Eldredge, Elkins, Hand, Hebley, 
Junge, E. G. Olds. 


Elkin, Mauchly, 
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STtaTE COLLEGE. 
Upper Darsy. 


E. Johnson, Wagner. 
Shohat, Simmons. 
RxHOopDE Isuanp (1) 


PROVIDENCE. Feller. 


SoutH Caro.ina (1) 


SPARTANBURG. Ripandelli. 


TENNESSEE (1) 


NASHVILLE. Densen. 


Texas (5) 


Co.uEGE Station. Hamilton. 
Datuas. Mouzon. 

Fort WortH. Deemer, Horst. 
Pecos. Van Voorhis. 


UraH (1) 


Sait Lake City. Bridger. 


VERMONT (1) 


BENNINGTON. Lundberg. 
VIRGINIA (22) 
ALEXANDRIA. Gause, Jeming, Mood. 


ARLINGTON. Buros, Danzig, Graves, 
Jacobs, Konijn, H. W. Norton, Schultz, 
W.S. Shelton, Stephan, Welsh. 

BuacksspurG. Harshbarger, Page, W. H. 
Thopson, Tyler. 

DAHLGREN. Dresch, Gutzman. 

Fort Betvorr. Peiser. 

Ricumonp. J. B. Coleman. 

VienNA. Brandt. 


WASHINGTON (4) 


PuttmaNn. Vatnsdal. 
SeaTTLeE. Birnbaum, Chang. 
WoopInvILLeE. Anthony. 


West Virernta (1) 


Houuipayrs Cove. Johner. 


WISscoNSIN (10) 


Eau Cuarre. Heide. 

Mapison. Barr, H. P. Evans, Fox, Ingra- 
ham, Kenney, Little, McCormick, 
Rosenthal. 

MILWAUKEE. Peach. 
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FOREIGN MEMBERS 


ARGENTINA (4) 


BANFIELD. Acerboni. 
BuENos ArrEs. Barral-Souto, Kafuri. 
Rosario. Dieulefait. 


AUSTRALIA (1) 
Belz. 
Braziu (4) 


Rio DE JANEIRO. DeCastro, Felippe, King- 
ston, Silveira. 


MELBOURNE, VICTORIA. 


CANADA (9) 


CHATHAM, ONTARIO. Beall. 

EpMoNTON, ALBERTA. Keeping. 

Kineston, ONTARIO. Edgett. 

Orrawa, OnTaRIO. Cudmore. 

Toronto, Ontario. De Lury, Gurland, 
R. W. B. Jackson, Wolfenden. 

WoLFrvILLe, Nova Scotia. Macphail. 


Cusa (2) 
Havana. Montes, Reyes. 


ENGLAND (4) 


BusHEY, HERTFORDSHIRE. Kendall. 
ILFRACOMBE, Devon. Perryman. 
Lonpon. Pearson. 

MANCHESTER. Ross. 


GUATEMALA (1) 
GUATEMALA City. Arias B. 


IRELAND (1) 
Cork. M. D. McCarthy. 


Perv (1) 
Lima. Sialer. 
Puerto Rico (1) 


Santurce. Capé. 


ScoTLaND (1) 
EpinsurcH. G. H. Thomson. 
Urvevay (1) 


MONTEVIDEO. Mazza. 
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THE INSTU(TUTE OF MATHEMATICAL STATISTICS 
(Organized September 12, 1935) 


OFFICERS FOR 1943 
President: 


Ceci C. Craia, University of Michigan, Ann Arbor, Michigan 
Vice-Presidents: 


W. E. Demine, Bureau of the Census, Washington 
A. Wap, Columbia University, New York 


Secretary-Treasurer: 
E. G. Otps, Carnegie Institute of Technology, Pittsburgh, Pa. 


The purpose of the Institute of Mathematical Statistics is to stimulate 
research in the mathematical theory of statistics and to promote codperation 
between the field of pure research and the fields of application. 


Membership+dues including subscription to the ANNALS OF MATHEMATICAL 


Statistics are $5.00 per year. The dues and inquiries regarding member- 
ship in the Institute should be sent to the Secretary-Treasurer of the 
Institute. 


WASHINGTON MEETING OF THE INSTITUTE 


A meeting of the Instirute or Matuematicat Sratistics will be held in 
Washington, D. C., on April 27-28, 1944. Besides invited addresses, there 
will be a meeting at which contributed papers will be presented. Abstracts 
of contributed papers should be sent to the Secretary before March 20, 1944. 





